1
|
Russell SL, Penunuri G, Condon C. Diverse genetic conflicts mediated by molecular mimicry and computational approaches to detect them. Semin Cell Dev Biol 2025; 165:1-12. [PMID: 39079455 DOI: 10.1016/j.semcdb.2024.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 07/03/2024] [Accepted: 07/14/2024] [Indexed: 09/07/2024]
Abstract
In genetic conflicts between intergenomic and selfish elements, driver and killer elements achieve biased survival, replication, or transmission over sensitive and targeted elements through a wide range of molecular mechanisms, including mimicry. Driving mechanisms manifest at all organismal levels, from the biased propagation of individual genes, as demonstrated by transposable elements, to the biased transmission of genomes, as illustrated by viruses, to the biased transmission of cell lineages, as in cancer. Targeted genomes are vulnerable to molecular mimicry through the conserved motifs they use for their own signaling and regulation. Mimicking these motifs enables an intergenomic or selfish element to control core target processes, and can occur at the sequence, structure, or functional level. Molecular mimicry was first appreciated as an important phenomenon more than twenty years ago. Modern genomics technologies, databases, and machine learning approaches offer tremendous potential to study the distribution of molecular mimicry across genetic conflicts in nature. Here, we explore the theoretical expectations for molecular mimicry between conflicting genomes, the trends in molecular mimicry mechanisms across known genetic conflicts, and outline how new examples can be gleaned from population genomic datasets. We discuss how mimics involving short sequence-based motifs or gene duplications can evolve convergently from new mutations. Whereas, processes that involve divergent domains or fully-folded structures occur among genomes by horizontal gene transfer. These trends are largely based on a small number of organisms and should be reevaluated in a general, phylogenetically independent framework. Currently, publicly available databases can be mined for genotypes driving non-Mendelian inheritance patterns, epistatic interactions, and convergent protein structures. A subset of these conflicting elements may be molecular mimics. We propose approaches for detecting genetic conflict and molecular mimicry from these datasets.
Collapse
Affiliation(s)
- Shelbi L Russell
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States; Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States.
| | - Gabriel Penunuri
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States; Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States
| | - Christopher Condon
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States; Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States
| |
Collapse
|
2
|
Belman S, Pesonen H, Croucher NJ, Bentley SD, Corander J. Estimating between-country migration in pneumococcal populations. G3 (BETHESDA, MD.) 2024; 14:jkae058. [PMID: 38507601 PMCID: PMC11152062 DOI: 10.1093/g3journal/jkae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 02/29/2024] [Accepted: 03/11/2024] [Indexed: 03/22/2024]
Abstract
Streptococcus pneumoniae (the pneumococcus) is a globally distributed, human obligate opportunistic bacterial pathogen which, although often carried commensally, is also a significant cause of invasive disease. Apart from multi-drug resistant and virulent clones, the rate and direction of pneumococcal dissemination between different countries remains largely unknown. The ability for the pneumococcus to take a foothold in a country depends on existing population configuration, the extent of vaccine implementation, as well as human mobility since it is a human obligate bacterium. To shed light on its international movement, we used extensive genome data from the Global Pneumococcal Sequencing project and estimated migration parameters between multiple countries in Africa. Data on allele frequencies of polymorphisms at housekeeping-like loci for multiple different lineages circulating in the populations of South Africa, Malawi, Kenya, and The Gambia were used to calculate the fixation index (Fst) between countries. We then further used these summaries to fit migration coalescent models with the likelihood-free inference algorithms available in the ELFI software package. Synthetic datawere additionally used to validate the inference approach. Our results demonstrate country-pair specific migration patterns and heterogeneity in the extent of migration between different lineages. Our approach demonstrates that coalescent models can be effectively used for inferring migration rates for bacterial species and lineages provided sufficiently granular population genomics surveillance data. Further, it can demonstrate the connectivity of respiratory disease agents between countries to inform intervention policy in the longer term.
Collapse
Affiliation(s)
- Sophie Belman
- Parasites and Microbes, Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Henri Pesonen
- Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, 0372, Norway
| | - Nicholas J Croucher
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, White City Campus, Imperial College London, London W12 0BZ, UK
| | - Stephen D Bentley
- Parasites and Microbes, Wellcome Sanger Institute, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, 0372, Norway
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Espoo, Helsinki, 02150, Finland
| |
Collapse
|
3
|
Penunuri G, Wang P, Corbett-Detig R, Russell SL. A Structural Proteome Screen Identifies Protein Mimicry in Host-Microbe Systems. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.10.588793. [PMID: 38645127 PMCID: PMC11030372 DOI: 10.1101/2024.04.10.588793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Host-microbe systems are evolutionary niches that produce coevolved biological interactions and are a key component of global health. However, these systems have historically been a difficult field of biological research due to their experimental intractability. Impactful advances in global health will be obtained by leveraging in silico screens to identify genes involved in mediating interspecific interactions. These predictions will progress our understanding of these systems and lay the groundwork for future in vitro and in vivo experiments and bioengineering projects. A driver of host-manipulation and intracellular survival utilized by host-associated microbes is molecular mimicry, a critical mechanism that can occur at any level from DNA to protein structures. We applied protein structure prediction and alignment tools to explore host-associated bacterial structural proteomes for examples of protein structure mimicry. By leveraging the Legionella pneumophila proteome and its many known structural mimics, we developed and validated a screen that can be applied to virtually any host-microbe system to uncover signals of protein mimicry. These mimics represent candidate proteins that mediate host interactions in microbial proteomes. We successfully applied this screen to other microbes with demonstrated effects on global health, Helicobacter pylori and Wolbachia , identifying protein mimic candidates in each proteome. We discuss the roles these candidates may play in important Wolbachia -induced phenotypes and show that Wobachia infection can partially rescue the loss of one of these factors. This work demonstrates how a genome-wide screen for candidates of host-manipulation and intracellular survival offers an opportunity to identify functionally important genes in host-microbe systems.
Collapse
|
4
|
Li X, Habibipour S, Chou T, Yang OO. The role of APOBEC3-induced mutations in the differential evolution of monkeypox virus. Virus Evol 2023; 9:vead058. [PMID: 37841642 PMCID: PMC10569380 DOI: 10.1093/ve/vead058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 09/03/2023] [Accepted: 09/18/2023] [Indexed: 10/17/2023] Open
Abstract
Recent studies show that newly sampled monkeypox virus (MPXV) genomes exhibit mutations consistent with Apolipoprotein B mRNA Editing Catalytic Polypeptide-like3 (APOBEC3)-mediated editing compared to MPXV genomes collected earlier. It is unclear whether these single-nucleotide polymorphisms (SNPs) result from APOBEC3-induced editing or are a consequence of genetic drift within one or more MPXV animal reservoirs. We develop a simple method based on a generalization of the General-Time-Reversible model to show that the observed SNPs are likely the result of APOBEC3-induced editing. The statistical features allow us to extract lineage information and estimate evolutionary events.
Collapse
Affiliation(s)
- Xiangting Li
- Department of Computational Medicine, UCLA, Los Angeles, CA, United States
| | - Sara Habibipour
- Departments of Medicine and Microbiology, Immunology, and Molecular Genetics, UCLA, Los Angeles, CA, United States
| | - Tom Chou
- Department of Computational Medicine, UCLA, Los Angeles, CA, United States
- Department of Mathematics, UCLA, Los Angeles, CA, United States
| | - Otto O Yang
- Departments of Medicine and Microbiology, Immunology, and Molecular Genetics, UCLA, Los Angeles, CA, United States
| |
Collapse
|
5
|
Liu Y, Li XC, Rashidi Mehrabadi F, Schäffer AA, Pratt D, Crawford DR, Malikić S, Molloy EK, Gopalan V, Mount SM, Ruppin E, Aldape KD, Sahinalp SC. Single-cell methylation sequencing data reveal succinct metastatic migration histories and tumor progression models. Genome Res 2023; 33:1089-1100. [PMID: 37316351 PMCID: PMC10538489 DOI: 10.1101/gr.277608.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/06/2023] [Indexed: 06/16/2023]
Abstract
Recent studies exploring the impact of methylation in tumor evolution suggest that although the methylation status of many of the CpG sites are preserved across distinct lineages, others are altered as the cancer progresses. Because changes in methylation status of a CpG site may be retained in mitosis, they could be used to infer the progression history of a tumor via single-cell lineage tree reconstruction. In this work, we introduce the first principled distance-based computational method, Sgootr, for inferring a tumor's single-cell methylation lineage tree and for jointly identifying lineage-informative CpG sites that harbor changes in methylation status that are retained along the lineage. We apply Sgootr on single-cell bisulfite-treated whole-genome sequencing data of multiregionally sampled tumor cells from nine metastatic colorectal cancer patients, as well as multiregionally sampled single-cell reduced-representation bisulfite sequencing data from a glioblastoma patient. We show that the tumor lineages constructed reveal a simple model underlying tumor progression and metastatic seeding. A comparison of Sgootr against alternative approaches shows that Sgootr can construct lineage trees with fewer migration events and with more in concordance with the sequential-progression model of tumor evolution, with a running time a fraction of that used in prior studies. Lineage-informative CpG sites identified by Sgootr are in inter-CpG island (CGI) regions, as opposed to intra-CGIs, which have been the main regions of interest in genomic methylation-related analyses.
Collapse
Affiliation(s)
- Yuelin Liu
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
- Department of Computer Science, University of Maryland, College Park, Maryland 20742, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | - Xuan Cindy Li
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
- Program in Computational Biology, Bioinformatics, and Genomics, University of Maryland, College Park, Maryland 20742, USA
| | - Farid Rashidi Mehrabadi
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Alejandro A Schäffer
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Drew Pratt
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - David R Crawford
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
- Program in Computational Biology, Bioinformatics, and Genomics, University of Maryland, College Park, Maryland 20742, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Salem Malikić
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Erin K Molloy
- Department of Computer Science, University of Maryland, College Park, Maryland 20742, USA
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | - Vishaka Gopalan
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Stephen M Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA
| | - Eytan Ruppin
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Kenneth D Aldape
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
| |
Collapse
|
6
|
Characterization and expression analysis of bHLH transcription factors reveal their putative regulatory effects on nectar spur development in Aquilegia species. Gene 2023; 852:147057. [PMID: 36410606 DOI: 10.1016/j.gene.2022.147057] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 10/27/2022] [Accepted: 11/14/2022] [Indexed: 11/19/2022]
Abstract
Nectar spur is a hollow extension of certain flower parts and shows strikingly diverse size and shape in Aquilegia. Nectar spur development is involved in cell division and expansion processes. The basic helix-loop-helix (bHLH) transcription factors (TFs) control a diversity of organ morphogenesis, including cell division and cell expansion processes. However, the role of bHLH genes in nectar spur development in Aquilegia is mainly unknown. We conducted a genome-wide identification of the bHLH gene family in Aquilegia to determine structural characteristics and phylogenetic relationships, and to analyze expression profiles of these genes during the development of nectar spur in spurless and spurred species. A total of 120 AqbHLH genes were identified from the Aquilegia coerulea genome. The phylogenetic tree showed that AqbHLH proteins were divided into 15 subfamilies, among which S7 and S8 subfamilies occurred marked expansion. The AqbHLH genes in the same clade had similar motif composition and gene structure characteristics. Conserved residue analysis indicated nineteen residues with conservation of more than 50% were found in the four conserved regions. In the upstream sequence of AqbHLH genes, the light-responsive element was the most abundant cis-acting element. Eighteen AqbHLH genes showed syntenic relationships, and eight genes from four syntenic pairs underwent tandem duplications. According to the expression profiling analysis by public RNA-Seq data and qRT-PCR results, five AqbHLH genes, including AqbHLH027, AqbHLH046, AqbHLH082, AqbHLH083 and AqbHLH092, were differentially expressed between different tissues in A. coerulea at early developmental stages, as well as between spurless and spurred Aquilegia species. Of them, AqbHLH046 was not only highly expressed in spur compared with blade, but also showed higher expression levels in spurred species than spurless specie, suggesting it plays an essential role in the development of spur by regulating cell division. This study lays a foundation to investigate the function of AqbHLH genes family in nectar spur development, and has potential implications for speciation and genetic breeding in the genus Aquilegia.
Collapse
|
7
|
Zheng L, Niknafs N, Wood LD, Karchin R, Scharpf RB. Estimation of cancer cell fractions and clone trees from multi-region sequencing of tumors. Bioinformatics 2022; 38:3677-3683. [PMID: 35642899 PMCID: PMC9344857 DOI: 10.1093/bioinformatics/btac367] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 05/23/2022] [Accepted: 05/27/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Multi-region sequencing of solid tumors can improve our understanding of intratumor subclonal diversity and the evolutionary history of mutational events. Due to uncertainty in clonal composition and the multitude of possible ancestral relationships between clones, elucidating the most probable relationships from bulk tumor sequencing poses statistical and computational challenges. RESULTS We developed a Bayesian hierarchical model called PICTograph to model uncertainty in assigning mutations to subclones, to enable posterior distributions of cancer cell fractions, and to visualize the most probable ancestral relationships between subclones. Compared to available methods, PICTograph provided more consistent and accurate estimates of cancer cell fractions and improved tree inference over a range of simulated clonal diversity. Application of PICTograph to multi-region whole exome sequencing of tumors from individuals with pancreatic cancer precursor lesions confirmed known early-occurring mutations and indicated substantial molecular diversity, including 6-12 distinct subclones and intra-sample mixing of subclones. Using ensemble-based visualizations, we highlight highly probable evolutionary relationships recovered in multiple models. PICTograph provides a useful approximation to evolutionary inference from cross-sectional multi-region sequencing, particularly for complex cases. AVAILABILITY https://github.com/KarchinLab/pictograph.
Collapse
Affiliation(s)
- Lily Zheng
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, 21205, U.S.A.,Institute for Computational Medicine, Johns Hopkins University, Baltimore, 21205, U.S.A
| | - Noushin Niknafs
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, U.S.A
| | - Laura D Wood
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, U.S.A.,Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, 21205, U.S.A
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, 21205, U.S.A.,Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, U.S.A.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, 21205, U.S.A
| | - Robert B Scharpf
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, U.S.A
| |
Collapse
|
8
|
Zhao Y, Fu X, Lopez JI, Rowan A, Au L, Fendler A, Hazell S, Xu H, Horswell S, Shepherd STC, Spain L, Byrne F, Stamp G, O'Brien T, Nicol D, Augustine M, Chandra A, Rudman S, Toncheva A, Pickering L, Sahai E, Larkin J, Bates PA, Swanton C, Turajlic S, Litchfield K. Selection of metastasis competent subclones in the tumour interior. Nat Ecol Evol 2021; 5:1033-1045. [PMID: 34002049 PMCID: PMC7611703 DOI: 10.1038/s41559-021-01456-6] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 03/03/2021] [Indexed: 02/07/2023]
Abstract
The genetic evolutionary features of solid tumour growth are becoming increasingly well described, but the spatial and physical nature of subclonal growth remains unclear. Here, we utilize 102 macroscopic whole-tumour images from clear cell renal cell carcinoma patients, with matched genetic and phenotypic data from 756 biopsies. Utilizing a digital image processing pipeline, a renal pathologist marked the boundaries between tumour and normal tissue and extracted positions of boundary line and biopsy regions to X and Y coordinates. We then integrated coordinates with genomic data to map exact spatial subclone locations, revealing how genetically distinct subclones grow and evolve spatially. We observed a phenotype of advanced and more aggressive subclonal growth in the tumour centre, characterized by an elevated burden of somatic copy number alterations and higher necrosis, proliferation rate and Fuhrman grade. Moreover, we found that metastasizing subclones preferentially originate from the tumour centre. Collectively, these observations suggest a model of accelerated evolution in the tumour interior, with harsh hypoxic environmental conditions leading to a greater opportunity for driver somatic copy number alterations to arise and expand due to selective advantage. Tumour subclone growth is predominantly spatially contiguous in nature. We found only two cases of subclone dispersal, one of which was associated with metastasis. The largest subclones spatially were dominated by driver somatic copy number alterations, suggesting that a large selective advantage can be conferred to subclones upon acquisition of these alterations. In conclusion, spatial dynamics is strongly associated with genomic alterations and plays an important role in tumour evolution.
Collapse
Affiliation(s)
- Yue Zhao
- Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute, London, UK
- Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK
- Department of Thoracic Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Xiao Fu
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK
| | - Jose I Lopez
- Department of Pathology, Cruces University Hospital, Biocruces-Bizkaia Institute, Barakaldo, Spain
| | - Andrew Rowan
- Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute, London, UK
| | - Lewis Au
- Cancer Dynamics Laboratory, The Francis Crick Institute, London, UK
- Renal and Skin Unit, the Royal Marsden NHS Foundation Trust, London, UK
| | - Annika Fendler
- Cancer Dynamics Laboratory, The Francis Crick Institute, London, UK
| | - Steve Hazell
- Department of Pathology, The Royal Marsden NHS Foundation Trust, London, UK
| | - Hang Xu
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Stuart Horswell
- Department of Bioinformatics and Biostatistics, The Francis Crick Institute, London, UK
| | - Scott T C Shepherd
- Cancer Dynamics Laboratory, The Francis Crick Institute, London, UK
- Renal and Skin Unit, the Royal Marsden NHS Foundation Trust, London, UK
| | - Lavinia Spain
- Cancer Dynamics Laboratory, The Francis Crick Institute, London, UK
- Renal and Skin Unit, the Royal Marsden NHS Foundation Trust, London, UK
| | - Fiona Byrne
- Cancer Dynamics Laboratory, The Francis Crick Institute, London, UK
| | - Gordon Stamp
- Experimental Histopathology Laboratory, The Francis Crick Institute, London, UK
| | - Tim O'Brien
- Urology Centre, Guy's and St. Thomas' NHS Foundation Trust, London, UK
| | - David Nicol
- Department of Urology, The Royal Marsden NHS Foundation Trust, London, UK
| | - Marcellus Augustine
- Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute, London, UK
| | - Ashish Chandra
- Department of Pathology, Guy's and St. Thomas' NHS Foundation Trust, London, UK
| | - Sarah Rudman
- Department of Medical Oncology, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | | | - Lisa Pickering
- Renal and Skin Unit, the Royal Marsden NHS Foundation Trust, London, UK
| | - Erik Sahai
- Tumour Cell Biology Laboratory, The Francis Crick Institute, London, UK
| | - James Larkin
- Renal and Skin Unit, the Royal Marsden NHS Foundation Trust, London, UK
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK.
| | - Charles Swanton
- Cancer Evolution and Genome Instability Laboratory, The Francis Crick Institute, London, UK.
- Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK.
- Department of Medical Oncology, University College London Hospitals, London, UK.
| | - Samra Turajlic
- Cancer Dynamics Laboratory, The Francis Crick Institute, London, UK.
- Renal and Skin Unit, the Royal Marsden NHS Foundation Trust, London, UK.
| | - Kevin Litchfield
- Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK.
- Tumour Immunogenomics and Immunosurveillance Laboratory, University College London Cancer Institute, London, UK.
| |
Collapse
|
9
|
Abstract
Syntenies are genomic segments of consecutive genes identified by a certain conservation in gene content and order. The notion of conservation may vary from one definition to another, the more constrained requiring identical gene contents and gene orders, while more relaxed definitions just require a certain similarity in gene content, and not necessarily in the same order. Regardless of the way they are identified, the goal is to characterize homologous genomic regions, i.e., regions deriving from a common ancestral region, reflecting a certain gene co-evolution that can enlighten important functional properties. In addition of being able to identify them, it is also necessary to infer the evolutionary history that has led from the ancestral segment to the extant ones. In this field, most algorithmic studies address the problem of inferring rearrangement scenarios explaining the disruption in gene order between segments with the same gene content, some of them extending the evolutionary model to gene insertion and deletion. However, syntenies also evolve through other events modifying their content in genes, such as duplications, losses or horizontal gene transfers, i.e., the movement of genes from one species to another. Although the reconciliation approach between a gene tree and a species tree addresses the problem of inferring such events for single-gene families, little effort has been dedicated to the generalization to segmental events and to syntenies. This paper reviews some of the main algorithmic methods for inferring ancestral syntenies and focus on those integrating both gene orders and gene trees.
Collapse
|
10
|
Wang S, Ge S, Colijn C, Biller P, Wang L, Elliott LT. Estimating Genetic Similarity Matrices Using Phylogenies. J Comput Biol 2021; 28:587-600. [PMID: 33926225 PMCID: PMC8219189 DOI: 10.1089/cmb.2020.0375] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Genetic similarity is a measure of the genetic relatedness among individuals. The standard method for computing these matrices involves the inner product of observed genetic variants. Such an approach is inaccurate or impossible if genotypes are not available, or not densely sampled, or of poor quality (e.g., genetic analysis of extinct species). We provide a new method for computing genetic similarities among individuals using phylogenetic trees. Our method can supplement (or stand in for) computations based on genotypes. We provide simulations suggesting that the genetic similarity matrices computed from trees are consistent with those computed from genotypes. With our methods, quantitative analysis on genetic traits and analysis of heritability and coheritability can be conducted directly using genetic similarity matrices and so in the absence of genotype data, or under uncertainty in the phylogenetic tree. We use simulation studies to demonstrate the advantages of our method, and we provide applications to data.
Collapse
Affiliation(s)
- Shijia Wang
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Shufei Ge
- Institute of Mathematical Sciences, ShanghaiTech University, Shanghai, China
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, Canada
| | - Priscila Biller
- Department of Mathematics, Simon Fraser University, Burnaby, Canada
| | - Liangliang Wang
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
| | - Lloyd T Elliott
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
11
|
Aganezov S, Raphael BJ. Reconstruction of clone- and haplotype-specific cancer genome karyotypes from bulk tumor samples. Genome Res 2020; 30:1274-1290. [PMID: 32887685 PMCID: PMC7545144 DOI: 10.1101/gr.256701.119] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 08/07/2020] [Indexed: 12/25/2022]
Abstract
Many cancer genomes are extensively rearranged with aberrant chromosomal karyotypes. Deriving these karyotypes from high-throughput DNA sequencing of bulk tumor samples is complicated because most tumors are a heterogeneous mixture of normal cells and subpopulations of cancer cells, or clones, that harbor distinct somatic mutations. We introduce a new algorithm, Reconstructing Cancer Karyotypes (RCK), to reconstruct haplotype-specific karyotypes of one or more rearranged cancer genomes from DNA sequencing data from a bulk tumor sample. RCK leverages evolutionary constraints on the somatic mutational process in cancer to reduce ambiguity in the deconvolution of admixed sequencing data into multiple haplotype-specific cancer karyotypes. RCK models mixtures containing an arbitrary number of derived genomes and allows the incorporation of information both from short-read and long-read DNA sequencing technologies. We compare RCK to existing approaches on 17 primary and metastatic prostate cancer samples. We find that RCK infers cancer karyotypes that better explain the DNA sequencing data and conform to a reasonable evolutionary model. RCK's reconstructions of clone- and haplotype-specific karyotypes will aid further studies of the role of intra-tumor heterogeneity in cancer development and response to treatment. RCK is freely available as open source software.
Collapse
Affiliation(s)
- Sergey Aganezov
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| |
Collapse
|
12
|
Greenman CD, Penso-Dolfin L, Wu T. The complexity of genome rearrangement combinatorics under the infinite sites model. J Theor Biol 2020; 501:110335. [DOI: 10.1016/j.jtbi.2020.110335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Revised: 04/16/2020] [Accepted: 05/14/2020] [Indexed: 11/30/2022]
|
13
|
Abstract
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make most effective use of our rapidly growing databases of whole genomes.
Collapse
Affiliation(s)
- Colin N Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
14
|
Reiter JG, Makohon-Moore AP, Gerold JM, Heyde A, Attiyeh MA, Kohutek ZA, Tokheim CJ, Brown A, DeBlasio RM, Niyazov J, Zucker A, Karchin R, Kinzler KW, Iacobuzio-Donahue CA, Vogelstein B, Nowak MA. Minimal functional driver gene heterogeneity among untreated metastases. Science 2018; 361:1033-1037. [PMID: 30190408 DOI: 10.1126/science.aat7171] [Citation(s) in RCA: 216] [Impact Index Per Article: 30.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 08/02/2018] [Indexed: 12/31/2022]
Abstract
Metastases are responsible for the majority of cancer-related deaths. Although genomic heterogeneity within primary tumors is associated with relapse, heterogeneity among treatment-naïve metastases has not been comprehensively assessed. We analyzed sequencing data for 76 untreated metastases from 20 patients and inferred cancer phylogenies for breast, colorectal, endometrial, gastric, lung, melanoma, pancreatic, and prostate cancers. We found that within individual patients, a large majority of driver gene mutations are common to all metastases. Further analysis revealed that the driver gene mutations that were not shared by all metastases are unlikely to have functional consequences. A mathematical model of tumor evolution and metastasis formation provides an explanation for the observed driver gene homogeneity. Thus, single biopsies capture most of the functionally important mutations in metastases and therefore provide essential information for therapeutic decision-making.
Collapse
Affiliation(s)
- Johannes G Reiter
- Canary Center for Cancer Early Detection, Department of Radiology, Stanford University School of Medicine, Palo Alto, CA 94305, USA. .,Program for Evolutionary Dynamics, Harvard University, Cambridge, MA 02138, USA
| | - Alvin P Makohon-Moore
- The David M. Rubenstein Center for Pancreatic Cancer Research, Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Jeffrey M Gerold
- Program for Evolutionary Dynamics, Harvard University, Cambridge, MA 02138, USA
| | - Alexander Heyde
- Program for Evolutionary Dynamics, Harvard University, Cambridge, MA 02138, USA
| | - Marc A Attiyeh
- The David M. Rubenstein Center for Pancreatic Cancer Research, Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Zachary A Kohutek
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Collin J Tokheim
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alexia Brown
- The David M. Rubenstein Center for Pancreatic Cancer Research, Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Rayne M DeBlasio
- The David M. Rubenstein Center for Pancreatic Cancer Research, Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Juliana Niyazov
- The David M. Rubenstein Center for Pancreatic Cancer Research, Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Amanda Zucker
- The David M. Rubenstein Center for Pancreatic Cancer Research, Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Rachel Karchin
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, USA.,Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Kenneth W Kinzler
- The Ludwig Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.,The Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.,Sidney Kimmel Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Christine A Iacobuzio-Donahue
- The David M. Rubenstein Center for Pancreatic Cancer Research, Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Bert Vogelstein
- The Ludwig Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.,The Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.,Sidney Kimmel Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,Howard Hughes Medical Institute, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Martin A Nowak
- Program for Evolutionary Dynamics, Harvard University, Cambridge, MA 02138, USA. .,Department of Organismic and Evolutionary Biology and Department of Mathematics, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
15
|
Zaccaria S, El-Kebir M, Klau GW, Raphael BJ. Phylogenetic Copy-Number Factorization of Multiple Tumor Samples. J Comput Biol 2018; 25:689-708. [PMID: 29658782 PMCID: PMC6067108 DOI: 10.1089/cmb.2017.0253] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Cancer is an evolutionary process driven by somatic mutations. This process can be represented as a phylogenetic tree. Constructing such a phylogenetic tree from genome sequencing data is a challenging task due to the many types of mutations in cancer and the fact that nearly all cancer sequencing is of a bulk tumor, measuring a superposition of somatic mutations present in different cells. We study the problem of reconstructing tumor phylogenies from copy-number aberrations (CNAs) measured in bulk-sequencing data. We introduce the Copy-Number Tree Mixture Deconvolution (CNTMD) problem, which aims to find the phylogenetic tree with the fewest number of CNAs that explain the copy-number data from multiple samples of a tumor. We design an algorithm for solving the CNTMD problem and apply the algorithm to both simulated and real data. On simulated data, we find that our algorithm outperforms existing approaches that either perform deconvolution/factorization of mixed tumor samples or build phylogenetic trees assuming homogeneous tumor samples. On real data, we analyze multiple samples from a prostate cancer patient, identifying clones within these samples and a phylogenetic tree that relates these clones and their differing proportions across samples. This phylogenetic tree provides a higher resolution view of copy-number evolution of this cancer than published analyses.
Collapse
Affiliation(s)
- Simone Zaccaria
- Department of Computer Science, Princeton University, Princeton, New Jersey
- Dipartimento di Informatica Sistemistica e Comunicazione (DISCo), Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Mohammed El-Kebir
- Department of Computer Science, Princeton University, Princeton, New Jersey
| | - Gunnar W. Klau
- Algorithmic Bioinformatics, Heinrich Heine University, Düsseldorf, Germany
| | | |
Collapse
|
16
|
Comparing Apples to Apples and Oranges to Oranges. Trends Genet 2018; 34:571-572. [PMID: 29853203 DOI: 10.1016/j.tig.2018.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 05/15/2018] [Indexed: 11/23/2022]
Abstract
A new study sequenced and assembled two rodent genomes to better understand the evolutionary forces shaping mammalian genomes. Their results suggest multiple roles for genomic repeats.
Collapse
|
17
|
Abstract
Integrated analysis of structural variants (SVs) and copy number alterations in aneuploid cancer genomes is key to understanding tumor genome complexity. A recently developed algorithm, Weaver, can estimate, for the first time, allele-specific copy number of SVs and their interconnectivity in aneuploid cancer genomes. However, one major limitation is that not all SVs identified by Weaver are phased. In this article, we develop a general convex programming framework that predicts the interconnectivity of unphased SVs with possibly noisy allele-specific copy number estimations as input. We demonstrated through applications to both simulated data and HeLa whole-genome sequencing data that our method is robust to the noise in the input copy numbers and can predict SV phasings with high specificity. We found that our method can make consistent predictions with Weaver even if a large proportion of the input variants are unphased. We also applied our method to The Cancer Genome Atlas (TCGA) ovarian cancer whole-genome sequencing samples to phase SVs left unphased by Weaver. Our work provides an important new algorithmic framework for recovering more complete allele-specific cancer genome graphs.
Collapse
Affiliation(s)
- Ashok Rajaraman
- Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania
| |
Collapse
|
18
|
Genetic alterations driving metastatic colony formation are acquired outside of the primary tumour in melanoma. Nat Commun 2018; 9:595. [PMID: 29426936 PMCID: PMC5807512 DOI: 10.1038/s41467-017-02674-y] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 12/19/2017] [Indexed: 02/07/2023] Open
Abstract
Mouse models indicate that metastatic dissemination occurs extremely early; however, the timing in human cancers is unknown. We therefore determined the time point of metastatic seeding relative to tumour thickness and genomic alterations in melanoma. Here, we find that lymphatic dissemination occurs shortly after dermal invasion of the primary lesion at a median thickness of ~0.5 mm and that typical driver changes, including BRAF mutation and gained or lost regions comprising genes like MET or CDKNA2, are acquired within the lymph node at the time of colony formation. These changes define a colonisation signature that was linked to xenograft formation in immunodeficient mice and death from melanoma. Thus, melanoma cells leave primary tumours early and evolve at different sites in parallel. We propose a model of metastatic melanoma dormancy, evolution and colonisation that will inform direct monitoring of adjuvant therapy targets.
Collapse
|
19
|
Anselmetti Y, Luhmann N, Bérard S, Tannier E, Chauve C. Comparative Methods for Reconstructing Ancient Genome Organization. Methods Mol Biol 2018; 1704:343-362. [PMID: 29277873 DOI: 10.1007/978-1-4939-7463-4_13] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Comparative genomics considers the detection of similarities and differences between extant genomes, and, based on more or less formalized hypotheses regarding the involved evolutionary processes, inferring ancestral states explaining the similarities and an evolutionary history explaining the differences. In this chapter, we focus on the reconstruction of the organization of ancient genomes into chromosomes. We review different methodological approaches and software, applied to a wide range of datasets from different kingdoms of life and at different evolutionary depths. We discuss relations with genome assembly, and potential approaches to validate computational predictions on ancient genomes that are almost always only accessible through these predictions.
Collapse
Affiliation(s)
- Yoann Anselmetti
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Nina Luhmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.,International Research Training Group1906, Bielefeld University, Bielefeld, Germany
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Eric Tannier
- UMR CNRS 5558 - LBBE "Biométrie et Biologie Évolutive", Inria Grenoble Rhône-Alpes and University of Lyon, Lyon, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada, V5A 1S6.
| |
Collapse
|
20
|
Zafar H, Tzen A, Navin N, Chen K, Nakhleh L. SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models. Genome Biol 2017; 18:178. [PMID: 28927434 PMCID: PMC5606061 DOI: 10.1186/s13059-017-1311-2] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2017] [Accepted: 08/28/2017] [Indexed: 02/06/2023] Open
Abstract
Single-cell sequencing enables the inference of tumor phylogenies that provide insights on intra-tumor heterogeneity and evolutionary trajectories. Recently introduced methods perform this task under the infinite-sites assumption, violations of which, due to chromosomal deletions and loss of heterozygosity, necessitate the development of inference methods that utilize finite-sites models. We propose a statistical inference method for tumor phylogenies from noisy single-cell sequencing data under a finite-sites model. The performance of our method on synthetic and experimental data sets from two colorectal cancer patients to trace evolutionary lineages in primary and metastatic tumors suggests that employing a finite-sites model leads to improved inference of tumor phylogenies.
Collapse
Affiliation(s)
- Hamim Zafar
- Department of Computer Science, Rice University, Houston, Texas, USA.,Department of Bioinformatics and Computational Biology, University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
| | - Anthony Tzen
- Department of Computer Science, Rice University, Houston, Texas, USA
| | - Nicholas Navin
- Department of Bioinformatics and Computational Biology, University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA.,Department of Genetics, University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA.
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas, USA.
| |
Collapse
|
21
|
Lucas JMEX, Roest Crollius H. High precision detection of conserved segments from synteny blocks. PLoS One 2017; 12:e0180198. [PMID: 28671949 PMCID: PMC5495381 DOI: 10.1371/journal.pone.0180198] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 06/12/2017] [Indexed: 11/19/2022] Open
Abstract
A conserved segment, i.e. a segment of chromosome unbroken during evolution, is an important operational concept in comparative genomics. Until now, algorithms that are designed to identify conserved segments often return synteny blocks that overlap, synteny blocks that include micro-rearrangements or synteny blocks erroneously short. Here we present definitions of conserved segments and synteny blocks independent of any heuristic method and we describe four new post-processing strategies to refine synteny blocks into accurate conserved segments. The first strategy identifies micro-rearrangements, the second strategy identifies mono-genic conserved segments, the third returns non-overlapping segments and the fourth repairs incorrect ruptures of synteny. All these refinements are implemented in a new version of PhylDiag that has been benchmarked against i-ADHoRe 3.0 and Cyntenator, based on a realistic simulated evolution and true simulated conserved segments.
Collapse
Affiliation(s)
- Joseph MEX Lucas
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research, University, Paris, France
| | - Hugues Roest Crollius
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research, University, Paris, France
| |
Collapse
|
22
|
Abstract
BACKGROUND Isometric gene tree reconciliation is a gene tree/species tree reconciliation problem where both the gene tree and the species tree include branch lengths, and these branch lengths must be respected by the reconciliation. The problem was introduced by Ma et al. in 2008 in the context of reconstructing evolutionary histories of genomes in the infinite sites model. RESULTS In this paper, we show that the original algorithm by Ma et al. is incorrect, and we propose a modified algorithm that addresses the problems that we discovered. We have also improved the running time from [Formula: see text] to [Formula: see text], where N is the total number of nodes in the two input trees. Finally, we examine two new variants of the problem: reconciliation of two unrooted trees and scaling of branch lengths of the gene tree during reconciliation of two rooted trees. CONCLUSIONS We provide several new algorithms for isometric reconciliation of trees. Some questions in this area remain open; most importantly extensions of the problem allowing for imprecise estimates of branch lengths.
Collapse
|
23
|
Reiter JG, Makohon-Moore AP, Gerold JM, Bozic I, Chatterjee K, Iacobuzio-Donahue CA, Vogelstein B, Nowak MA. Reconstructing metastatic seeding patterns of human cancers. Nat Commun 2017; 8:14114. [PMID: 28139641 PMCID: PMC5290319 DOI: 10.1038/ncomms14114] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Accepted: 11/24/2016] [Indexed: 12/12/2022] Open
Abstract
Reconstructing the evolutionary history of metastases is critical for understanding their basic biological principles and has profound clinical implications. Genome-wide sequencing data has enabled modern phylogenomic methods to accurately dissect subclones and their phylogenies from noisy and impure bulk tumour samples at unprecedented depth. However, existing methods are not designed to infer metastatic seeding patterns. Here we develop a tool, called Treeomics, to reconstruct the phylogeny of metastases and map subclones to their anatomic locations. Treeomics infers comprehensive seeding patterns for pancreatic, ovarian, and prostate cancers. Moreover, Treeomics correctly disambiguates true seeding patterns from sequencing artifacts; 7% of variants were misclassified by conventional statistical methods. These artifacts can skew phylogenies by creating illusory tumour heterogeneity among distinct samples. In silico benchmarking on simulated tumour phylogenies across a wide range of sample purities (15–95%) and sequencing depths (25-800 × ) demonstrates the accuracy of Treeomics compared with existing methods. Tumours frequently metastasize to multiple anatomical sites and understanding how these different metastases evolve may be important for therapy. Here, the authors develop a method—Treeomics—that can construct phylogenies from multiple metastases from next-generation sequencing data.
Collapse
Affiliation(s)
- Johannes G Reiter
- Program for Evolutionary Dynamics, Harvard University, Cambridge, Massachusetts 02138, USA.,IST (Institute of Science and Technology) Austria, Klosterneuburg 3400, Austria
| | - Alvin P Makohon-Moore
- The David M. Rubenstein Center for Pancreatic Cancer Research, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Jeffrey M Gerold
- Program for Evolutionary Dynamics, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Ivana Bozic
- Program for Evolutionary Dynamics, Harvard University, Cambridge, Massachusetts 02138, USA.,Department of Mathematics, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | - Christine A Iacobuzio-Donahue
- The David M. Rubenstein Center for Pancreatic Cancer Research, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA.,Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York 10065, USA
| | - Bert Vogelstein
- The Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins University School of Medicine, Baltimore, Maryland 21287, USA.,The Ludwig Center and Howard Hughes Medical Institute at The Johns Hopkins University School of Medicine, Baltimore, Maryland 21287, USA
| | - Martin A Nowak
- Program for Evolutionary Dynamics, Harvard University, Cambridge, Massachusetts 02138, USA.,Department of Mathematics, Harvard University, Cambridge, Massachusetts 02138, USA.,Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
24
|
Stochastic tunneling and metastable states during the somatic evolution of cancer. Genetics 2015; 199:1213-28. [PMID: 25624316 DOI: 10.1534/genetics.114.171553] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Accepted: 01/19/2015] [Indexed: 12/29/2022] Open
Abstract
Tumors initiate when a population of proliferating cells accumulates a certain number and type of genetic and/or epigenetic alterations. The population dynamics of such sequential acquisition of (epi)genetic alterations has been the topic of much investigation. The phenomenon of stochastic tunneling, where an intermediate mutant in a sequence does not reach fixation in a population before generating a double mutant, has been studied using a variety of computational and mathematical methods. However, the field still lacks a comprehensive analytical description since theoretical predictions of fixation times are available only for cases in which the second mutant is advantageous. Here, we study stochastic tunneling in a Moran model. Analyzing the deterministic dynamics of large populations we systematically identify the parameter regimes captured by existing approaches. Our analysis also reveals fitness landscapes and mutation rates for which finite populations are found in long-lived metastable states. These are landscapes in which the final mutant is not the most advantageous in the sequence, and resulting metastable states are a consequence of a mutation-selection balance. The escape from these states is driven by intrinsic noise, and their location affects the probability of tunneling. Existing methods no longer apply. In these regimes it is the escape from the metastable states that is the key bottleneck; fixation is no longer limited by the emergence of a successful mutant lineage. We used the so-called Wentzel-Kramers-Brillouin method to compute fixation times in these parameter regimes, successfully validated by stochastic simulations. Our work fills a gap left by previous approaches and provides a more comprehensive description of the acquisition of multiple mutations in populations of somatic cells.
Collapse
|
25
|
Hu F, Zhou J, Zhou L, Tang J. Probabilistic Reconstruction of Ancestral Gene Orders with Insertions and Deletions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:667-672. [PMID: 26356337 DOI: 10.1109/tcbb.2014.2309602] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Changes of gene orderings have been extensively used as a signal to reconstruct phylogenies and ancestral genomes. Inferring the gene order of an extinct species has a wide range of applications, including the potential to reveal more detailed evolutionary histories, to determine gene content and ordering, and to understand the consequences of structural changes for organismal function and species divergence. In this study, we propose a new adjacency-based method, PMAG(+) , to infer ancestral genomes under a more general model of gene evolution involving gene insertions and deletions (indels), in addition to gene rearrangements. PMAG(+) improves on our previous method PMAG by developing a new approach to infer ancestral gene contents and reducing the adjacency assembly problem to an instance of TSP. We designed a series of experiments to extensively validate PMAG(+) and compared the results with the most recent and comparable method GapAdj. According to the results, ancestral gene contents predicted by PMAG(+) coincides highly with the actual contents with error rates less than 1 percent. Under various degrees of indels, PMAG(+) consistently achieves more accurate prediction of ancestral gene orders and at the same time, produces contigs very close to the actual chromosomes.
Collapse
|
26
|
Paten B, Zerbino DR, Hickey G, Haussler D. A unifying model of genome evolution under parsimony. BMC Bioinformatics 2014; 15:206. [PMID: 24946830 PMCID: PMC4082375 DOI: 10.1186/1471-2105-15-206] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2013] [Accepted: 05/08/2014] [Indexed: 11/23/2022] Open
Abstract
Background Parsimony and maximum likelihood methods of phylogenetic tree estimation and parsimony methods for genome rearrangements are central to the study of genome evolution yet to date they have largely been pursued in isolation. Results We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph G, a finite set of AVGs describe all parsimonious interpretations of G, and this set can be explored with a few sampling moves. Conclusion This theoretical study describes a model in which the inference of genome rearrangements and phylogeny can be unified under parsimony.
Collapse
Affiliation(s)
- Benedict Paten
- University of California, Santa Cruz, 1156 High St, 95064 Santa Cruz, USA.
| | | | | | | |
Collapse
|
27
|
Brejová B, Kravec M, Landau GM, Vinař T. Fast computation of a string duplication history under no-breakpoint-reuse. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2014; 372:20130133. [PMID: 24751867 PMCID: PMC3996574 DOI: 10.1098/rsta.2013.0133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, we provide an O(n log(2) n log log n log* n) algorithm to compute a duplication history of a string under no-breakpoint-reuse condition. The motivation of this problem stems from computational biology, in particular, from analysis of complex gene clusters. The problem is also related to computing edit distance with block operations, but, in our scenario, the start of the history is not fixed, but chosen to minimize the distance measure.
Collapse
Affiliation(s)
- Broňa Brejová
- Faculty of Mathematics, Physics and Informatics, Comenius University, Mlynská dolina, 842 48 Bratislava, Slovakia
| | - Martin Kravec
- Faculty of Mathematics, Physics and Informatics, Comenius University, Mlynská dolina, 842 48 Bratislava, Slovakia
| | - Gad M. Landau
- Department of Computer Science, University of Haifa, Haifa 31905, Israel
- Department of Computer Science and Engineering, NYU-Poly, Six MetroTech Center, Brooklyn, NY 11201-3840, USA
| | - Tomáš Vinař
- Faculty of Mathematics, Physics and Informatics, Comenius University, Mlynská dolina, 842 48 Bratislava, Slovakia
| |
Collapse
|
28
|
Abstract
BACKGROUND The introduction of the double cut and join operation (DCJ) caused a flurry of research into the study of multichromosomal rearrangements. However, little of this work has incorporated indels (i.e., insertions and deletions of chromosomes and chromosomal intervals) into the calculation of genomic distance functions, with the exception of Braga et al., who provided a linear time algorithm for the problem of DCJ-indel sorting. Although their algorithm only takes linear time, its derivation is lengthy and depends on a large number of possible cases. RESULTS We note the simple idea that a deletion of a chromosomal interval can be viewed as a DCJ that creates a new circular chromosome. This framework will allow us to amortize indels as DCJs, which in turn permits the application of the classical breakpoint graph to obtain a simplified indel model that still solves the problem of DCJ-indel sorting in linear time via a more concise formulation that relies on the simpler problem of DCJ sorting. Furthermore, we can extend this result to fully characterize the solution space of DCJ-indel sorting. CONCLUSIONS Encoding indels as DCJ operations offers a new insight into why the problem of DCJ-indel sorting is not ultimately any more difficult than that of sorting by DCJs alone. There is still room for research in this area, most notably the problem of sorting when the cost of indels is allowed to vary with respect to the cost of a DCJ and we demand a minimum cost transformation of one genome into another.
Collapse
|
29
|
Abstract
One of the most difficult problems in modern genomics is the assembly of full-length chromosomes using next generation sequencing (NGS) data. To address this problem, we developed "reference-assisted chromosome assembly" (RACA), an algorithm to reliably order and orient sequence scaffolds generated by NGS and assemblers into longer chromosomal fragments using comparative genome information and paired-end reads. Evaluation of results using simulated and real genome assemblies indicates that our approach can substantially improve genomes generated by a wide variety of de novo assemblers if a good reference assembly of a closely related species and outgroup genomes are available. We used RACA to reconstruct 60 Tibetan antelope (Pantholops hodgsonii) chromosome fragments from 1,434 SOAPdenovo sequence scaffolds, of which 16 chromosome fragments were homologous to complete cattle chromosomes. Experimental validation by PCR showed that predictions made by RACA are highly accurate. Our results indicate that RACA will significantly facilitate the study of chromosome evolution and genome rearrangements for the large number of genomes being sequenced by NGS that do not have a genetic or physical map.
Collapse
|
30
|
Chauve C, El-Mabrouk N, Guéguen L, Semeria M, Tannier E. Duplication, Rearrangement and Reconciliation: A Follow-Up 13 Years Later. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
31
|
Moret BME, Lin Y, Tang J. Rearrangements in Phylogenetic Inference: Compare, Model, or Encode? MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
32
|
Suen S, Lu HHS, Yeang CH. Evolution of domain architectures and catalytic functions of enzymes in metabolic systems. Genome Biol Evol 2012; 4:976-93. [PMID: 22936075 PMCID: PMC3468959 DOI: 10.1093/gbe/evs072] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Domain architectures and catalytic functions of enzymes constitute the centerpieces of a metabolic network. These types of information are formulated as a two-layered network consisting of domains, proteins, and reactions-a domain-protein-reaction (DPR) network. We propose an algorithm to reconstruct the evolutionary history of DPR networks across multiple species and categorize the mechanisms of metabolic systems evolution in terms of network changes. The reconstructed history reveals distinct patterns of evolutionary mechanisms between prokaryotic and eukaryotic networks. Although the evolutionary mechanisms in early ancestors of prokaryotes and eukaryotes are quite similar, more novel and duplicated domain compositions with identical catalytic functions arise along the eukaryotic lineage. In contrast, prokaryotic enzymes become more versatile by catalyzing multiple reactions with similar chemical operations. Moreover, different metabolic pathways are enriched with distinct network evolution mechanisms. For instance, although the pathways of steroid biosynthesis, protein kinases, and glycosaminoglycan biosynthesis all constitute prominent features of animal-specific physiology, their evolution of domain architectures and catalytic functions follows distinct patterns. Steroid biosynthesis is enriched with reaction creations but retains a relatively conserved repertoire of domain compositions and proteins. Protein kinases retain conserved reactions but possess many novel domains and proteins. In contrast, glycosaminoglycan biosynthesis has high rates of reaction/protein creations and domain recruitments. Finally, we elicit and validate two general principles underlying the evolution of DPR networks: 1) duplicated enzyme proteins possess similar catalytic functions and 2) the majority of novel domains arise to catalyze novel reactions. These results shed new lights on the evolution of metabolic systems.
Collapse
Affiliation(s)
- Summit Suen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | | | |
Collapse
|
33
|
Abstract
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.
Collapse
Affiliation(s)
- Colin N Dewey
- Biostatistics and Medical Informatics and Computer Sciences, Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
34
|
Melián CJ, Alonso D, Allesina S, Condit RS, Etienne RS. Does sex speed up evolutionary rate and increase biodiversity? PLoS Comput Biol 2012; 8:e1002414. [PMID: 22412362 PMCID: PMC3297559 DOI: 10.1371/journal.pcbi.1002414] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2011] [Accepted: 01/20/2012] [Indexed: 01/22/2023] Open
Abstract
Most empirical and theoretical studies have shown that sex increases the rate of evolution, although evidence of sex constraining genomic and epigenetic variation and slowing down evolution also exists. Faster rates with sex have been attributed to new gene combinations, removal of deleterious mutations, and adaptation to heterogeneous environments. Slower rates with sex have been attributed to removal of major genetic rearrangements, the cost of finding a mate, vulnerability to predation, and exposure to sexually transmitted diseases. Whether sex speeds or slows evolution, the connection between reproductive mode, the evolutionary rate, and species diversity remains largely unexplored. Here we present a spatially explicit model of ecological and evolutionary dynamics based on DNA sequence change to study the connection between mutation, speciation, and the resulting biodiversity in sexual and asexual populations. We show that faster speciation can decrease the abundance of newly formed species and thus decrease long-term biodiversity. In this way, sex can reduce diversity relative to asexual populations, because it leads to a higher rate of production of new species, but with lower abundances. Our results show that reproductive mode and the mechanisms underlying it can alter the link between mutation, evolutionary rate, speciation and biodiversity and we suggest that a high rate of evolution may not be required to yield high biodiversity. The role of sex in driving genetic variation and the speed at which new species emerge has been debated for over a century. There is experimental and theoretical evidence that sex increases genetic variation and the speed at which new species emerge, although evidence that sex reduces variation and slows the formation of new species also exists. Surprisingly, given the link between sex and genetic variation, little work has been done on the impact of sex on biodiversity. In the present theoretical study we show that a faster evolutionary rate can decrease the abundance of newly formed species and thus decrease long-term biodiversity. This leads to the paradoxical result that sexual reproduction can increase genetic variation but reduce species diversity. These results suggest that reducing the rate of appearance of genetic variation and the speed at which new species emerge may increase biodiversity in the long-term. This unexpected link between reproductive mode, the speed of evolution and biodiversity suggests that a high evolutionary rate may not be required to yield a large number of species in natural ecosystems.
Collapse
Affiliation(s)
- Carlos J Melián
- National Center for Ecological Analysis and Synthesis, University of California, Santa Barbara, California, United States of America.
| | | | | | | | | |
Collapse
|
35
|
Corbi J, Dutheil JY, Damerval C, Tenaillon MI, Manicacci D. Accelerated evolution and coevolution drove the evolutionary history of AGPase sub-units during angiosperm radiation. ANNALS OF BOTANY 2012; 109:693-708. [PMID: 22307567 PMCID: PMC3286274 DOI: 10.1093/aob/mcr303] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Accepted: 11/07/2011] [Indexed: 05/10/2023]
Abstract
BACKGROUND AND AIMS ADP-glucose pyrophosphorylase (AGPase) is a key enzyme of starch biosynthesis. In the green plant lineage, it is composed of two large (LSU) and two small (SSU) sub-units encoded by paralogous genes, as a consequence of several rounds of duplication. First, our aim was to detect specific patterns of molecular evolution following duplication events and the divergence between monocotyledons and dicotyledons. Secondly, we investigated coevolution between amino acids both within and between sub-units. METHODS A phylogeny of each AGPase sub-unit was built using all gymnosperm and angiosperm sequences available in databases. Accelerated evolution along specific branches was tested using the ratio of the non-synonymous to the synonymous substitution rate. Coevolution between amino acids was investigated taking into account compensatory changes between co-substitutions. KEY RESULTS We showed that SSU paralogues evolved under high functional constraints during angiosperm radiation, with a significant level of coevolution between amino acids that participate in SSU major functions. In contrast, in the LSU paralogues, we identified residues under positive selection (1) following the first LSU duplication that gave rise to two paralogues mainly expressed in angiosperm source and sink tissues, respectively; and (2) following the emergence of grass-specific paralogues expressed in the endosperm. Finally, we found coevolution between residues that belong to the interaction domains of both sub-units. CONCLUSIONS Our results support the view that coevolution among amino acid residues, especially those lying in the interaction domain of each sub-unit, played an important role in AGPase evolution. First, within SSU, coevolution allowed compensating mutations in a highly constrained context. Secondly, the LSU paralogues probably acquired tissue-specific expression and regulatory properties via the coevolution between sub-unit interacting domains. Finally, the pattern we observed during LSU evolution is consistent with repeated sub-functionalization under 'Escape from Adaptive Conflict', a model rarely illustrated in the literature.
Collapse
Affiliation(s)
- Jonathan Corbi
- CNRS, UMR 0320/UMR 8120 Génétique Végétale, Ferme du Moulon, F-91190 Gif sur Yvette, France
| | - Julien Y. Dutheil
- BiRC-Bioinformatics Research Center, Aarhus University, C.F. Møllers Alle 8, Building 1110, DK-8000 Århus C, Denmark
| | - Catherine Damerval
- CNRS, UMR 0320/UMR 8120 Génétique Végétale, Ferme du Moulon, F-91190 Gif sur Yvette, France
| | - Maud I. Tenaillon
- CNRS, UMR 0320/UMR 8120 Génétique Végétale, Ferme du Moulon, F-91190 Gif sur Yvette, France
| | - Domenica Manicacci
- Université Paris-Sud, UMR 0320/UMR 8120 Génétique Végétale, Ferme du Moulon, F-91190 Gif sur Yvette, France
| |
Collapse
|
36
|
Attie O, Darling AE, Yancopoulos S. The rise and fall of breakpoint reuse depending on genome resolution. BMC Bioinformatics 2011; 12 Suppl 9:S1. [PMID: 22151330 PMCID: PMC3283316 DOI: 10.1186/1471-2105-12-s9-s1] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND During evolution, large-scale genome rearrangements of chromosomes shuffle the order of homologous genome sequences ("synteny blocks") across species. Some years ago, a controversy erupted in genome rearrangement studies over whether rearrangements recur, causing breakpoints to be reused. METHODS We investigate this controversial issue using the synteny block's for human-mouse-rat reported by Bourque et al. and a series of synteny blocks we generated using Mauve at resolutions ranging from coarse to very fine-scale. We conducted analyses to test how resolution affects the traditional measure of the breakpoint reuse rate. RESULTS We found that the inversion-based breakpoint reuse rate is low at fine-scale synteny block resolution and that it rises and eventually falls as synteny block resolution decreases. By analyzing the cycle structure of the breakpoint graph of human-mouse-rat synteny blocks for human-mouse and comparing with theoretically derived distributions for random genome rearrangements, we showed that the implied genome rearrangements at each level of resolution become more "random" as synteny block resolution diminishes. At highest synteny block resolutions the Hannenhalli-Pevzner inversion distance deviates from the Double Cut and Join distance, possibly due to small-scale transpositions or simply due to inclusion of erroneous synteny blocks. At synteny block resolutions as coarse as the Bourque et al. blocks, we show the breakpoint graph cycle structure has already converged to the pattern expected for a random distribution of synteny blocks. CONCLUSIONS The inferred breakpoint reuse rate depends on synteny block resolution in human-mouse genome comparisons. At fine-scale resolution, the cycle structure for the transformation appears less random compared to that for coarse resolution. Small synteny blocks may contain critical information for accurate reconstruction of genome rearrangement history and parameters.
Collapse
Affiliation(s)
- Oliver Attie
- Department of Infectious Diseases, Mount Sinai School of Medicine, NY, NY 10029, USA
| | | | | |
Collapse
|
37
|
Ma J. Reconstructing the history of large-scale genomic changes: biological questions and computational challenges. J Comput Biol 2011; 18:879-93. [PMID: 21563973 DOI: 10.1089/cmb.2010.0189] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
In addition to point mutations, larger-scale structural changes (including rearrangements, duplications, insertions, and deletions) are also prevalent between different mammalian genomes. Capturing these large-scale changes is critical to unraveling the history of mammalian evolution in order to better understand the human genome. It also has profound biomedical significance, because many human diseases are associated with structural genomic aberrations. The increasing number of mammalian genomes being sequenced as well as the recent advancement in DNA sequencing technologies are allowing us to identify these structural genomic changes with vastly greater accuracy. However, there are a considerable number of computational challenges related to these problems. In this article, we introduce the ancestral genome reconstruction problem, which enables us to explain the large-scale genomic changes between species in an evolutionary context. The application of these methods to within-species structural variation and disease genome analysis is also discussed. The target audience of this article is advanced undergraduate students in biology.
Collapse
Affiliation(s)
- Jian Ma
- Department of Bioengineering, Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
| |
Collapse
|
38
|
Roskin KM, Paten B, Haussler D. Meta-alignment with crumble and prune: partitioning very large alignment problems for performance and parallelization. BMC Bioinformatics 2011; 12:144. [PMID: 21569267 PMCID: PMC3114744 DOI: 10.1186/1471-2105-12-144] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2010] [Accepted: 05/10/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Continuing research into the global multiple sequence alignment problem has resulted in more sophisticated and principled alignment methods. Unfortunately these new algorithms often require large amounts of time and memory to run, making it nearly impossible to run these algorithms on large datasets. As a solution, we present two general methods, Crumble and Prune, for breaking a phylogenetic alignment problem into smaller, more tractable sub-problems. We call Crumble and Prune meta-alignment methods because they use existing alignment algorithms and can be used with many current alignment programs. Crumble breaks long alignment problems into shorter sub-problems. Prune divides the phylogenetic tree into a collection of smaller trees to reduce the number of sequences in each alignment problem. These methods are orthogonal: they can be applied together to provide better scaling in terms of sequence length and in sequence depth. Both methods partition the problem such that many of the sub-problems can be solved independently. The results are then combined to form a solution to the full alignment problem. RESULTS Crumble and Prune each provide a significant performance improvement with little loss of accuracy. In some cases, a gain in accuracy was observed. Crumble and Prune were tested on real and simulated data. Furthermore, we have implemented a system called Job-tree that allows hierarchical sub-problems to be solved in parallel on a compute cluster, significantly shortening the run-time. CONCLUSIONS These methods enabled us to solve gigabase alignment problems. These methods could enable a new generation of biologically realistic alignment algorithms to be applied to real world, large scale alignment problems.
Collapse
Affiliation(s)
- Krishna M Roskin
- Department of Computer Science, Univ. of California, Santa Cruz, USA
| | - Benedict Paten
- Center for Biomolecular Science & Engineering, Univ. of California, Santa Cruz, USA
| | - David Haussler
- Howard Hughes Medical Institute, Univ. of California, Santa Cruz, USA
| |
Collapse
|
39
|
Song G, Zhang L, Vinar T, Miller W. CAGE: Combinatorial Analysis of Gene-cluster Evolution. J Comput Biol 2011; 17:1227-42. [PMID: 20874406 DOI: 10.1089/cmb.2010.0094] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Much important evolutionary activity occurs in gene clusters, where a copy of a gene may be free to acquire new functions. Current computational methods to extract evolutionary information from sequence data for such clusters are suboptimal, in part because accurate sequence data are often lacking in these genomic regions, making existing methods difficult to apply. We describe a new method for reconstructing the recent evolutionary history of gene clusters, and evaluate its performance on both simulated data and actual human gene clusters.
Collapse
Affiliation(s)
- Giltae Song
- Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, PA 16802, USA.
| | | | | | | |
Collapse
|
40
|
Vinar T, Brejová B, Song G, Siepel A. Reconstructing histories of complex gene clusters on a phylogeny. J Comput Biol 2011; 17:1267-79. [PMID: 20874408 DOI: 10.1089/cmb.2010.0090] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. These clusters are one of the major sources of evolutionary innovation, and they are linked to multiple diseases, including HIV and a variety of cancers. Understanding their evolutionary histories is a key to the application of comparative genomics methods in these regions of the genome. We propose a probabilistic model of gene cluster evolution on a phylogeny, and an MCMC algorithm for reconstruction of duplication histories from genomic sequences in multiple species. Several projects are underway to obtain high quality BAC-based assemblies of duplicated clusters in multiple species, and we anticipate use of our methods in their analysis.
Collapse
Affiliation(s)
- Tomás Vinar
- Faculty of Mathematics, Physics and Informatics, Comenius University , Bratislava, Slovakia
| | | | | | | |
Collapse
|
41
|
Li X, Zhu C, Lin Z, Wu Y, Zhang D, Bai G, Song W, Ma J, Muehlbauer GJ, Scanlon MJ, Zhang M, Yu J. Chromosome size in diploid eukaryotic species centers on the average length with a conserved boundary. Mol Biol Evol 2011; 28:1901-11. [PMID: 21239390 PMCID: PMC3098514 DOI: 10.1093/molbev/msr011] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Understanding genome and chromosome evolution is important for understanding genetic inheritance and evolution. Universal events comprising DNA replication, transcription, repair, mobile genetic element transposition, chromosome rearrangements, mitosis, and meiosis underlie inheritance and variation of living organisms. Although the genome of a species as a whole is important, chromosomes are the basic units subjected to genetic events that coin evolution to a large extent. Now many complete genome sequences are available, we can address evolution and variation of individual chromosomes across species. For example, “How are the repeat and nonrepeat proportions of genetic codes distributed among different chromosomes in a multichromosome species?” “Is there a general rule behind the intuitive observation that chromosome lengths tend to be similar in a species, and if so, can we generalize any findings in chromosome content and size across different taxonomic groups?” Here, we show that chromosomes within a species do not show dramatic fluctuation in their content of mobile genetic elements as the proliferation of these elements increases from unicellular eukaryotes to vertebrates. Furthermore, we demonstrate that, notwithstanding the remarkable plasticity, there is an upper limit to chromosome-size variation in diploid eukaryotes with linear chromosomes. Strikingly, variation in chromosome size for 886 chromosomes in 68 eukaryotic genomes (including 22 human autosomes) can be viably captured by a single model, which predicts that the vast majority of the chromosomes in a species are expected to have a base pair length between 0.4035 and 1.8626 times the average chromosome length. This conserved boundary of chromosome-size variation, which prevails across a wide taxonomic range with few exceptions, indicates that cellular, molecular, and evolutionary mechanisms, possibly together, confine the chromosome lengths around a species-specific average chromosome length.
Collapse
Affiliation(s)
- Xianran Li
- Department of Agronomy, Kansas State University, KS, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Abstract
We present a graph-based model for representing two aligned genomic sequences. An alignment graph is a mixed graph consisting of two sets of vertices, each representing one of the input sequences, and three sets of edges. These edges allow the model to represent a number of evolutionary events. This model is used to perform sequence alignment at the level of nucleotides. We define a scoring function for alignment graphs. We show that minimizing the score is NP-complete. However, we present a dynamic programming algorithm that solves the minimization problem optimally for a certain class of alignments, called breakable arrangements. Algorithms for analyzing breakable arrangements are presented. We also present a greedy algorithm that is capable of representing reversals. We present a dynamic programming algorithm that optimally aligns two genomic sequences, when one of the input sequences is a breakable arrangement of the other. Comparing what we define as breakable arrangements to alignments generated by other algorithms, it is seen that many already aligned genomes fall into the category of being breakable. Moreover, the greedy algorithm is shown to represent reversals, besides rearrangements, mutations, and other evolutionary events.
Collapse
Affiliation(s)
- Nahla A Belal
- Department of Computer Science, AAST, Alexandria, Egypt
| | | |
Collapse
|
43
|
Melián CJ, Vilas C, Baldó F, González-Ortegón E, Drake P, Williams RJ. Eco-evolutionary Dynamics of Individual-Based Food Webs. ADV ECOL RES 2011. [DOI: 10.1016/b978-0-12-386475-8.00006-x] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
44
|
Alekseyev MA, Pevzner PA. Comparative genomics reveals birth and death of fragile regions in mammalian evolution. Genome Biol 2010; 11:R117. [PMID: 21118492 PMCID: PMC3156956 DOI: 10.1186/gb-2010-11-11-r117] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2010] [Revised: 10/05/2010] [Accepted: 11/30/2010] [Indexed: 12/15/2022] Open
Abstract
Background An important question in genome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. Although nearly all recent studies supported the existence of fragile regions in mammalian genomes, the most comprehensive phylogenomic study of mammals raised some doubts about their existence. Results Here we demonstrate that fragile regions are subject to a birth and death process, implying that fragility has a limited evolutionary lifespan. Conclusions This finding implies that fragile regions migrate to different locations in different mammals, explaining why there exist only a few chromosomal breakpoints shared between different lineages. The birth and death of fragile regions as a phenomenon reinforces the hypothesis that rearrangements are promoted by matching segmental duplications and suggests putative locations of the currently active fragile regions in the human genome.
Collapse
Affiliation(s)
- Max A Alekseyev
- Department of Computer Science & Engineering, University of South Carolina, 301 Main St, Columbia, SC 29208, USA.
| | | |
Collapse
|
45
|
Brown JD, O'Neill RJ. Chromosomes, conflict, and epigenetics: chromosomal speciation revisited. Annu Rev Genomics Hum Genet 2010; 11:291-316. [PMID: 20438362 DOI: 10.1146/annurev-genom-082509-141554] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Since Darwin first noted that the process of speciation was indeed the "mystery of mysteries," scientists have tried to develop testable models for the development of reproductive incompatibilities-the first step in the formation of a new species. Early theorists proposed that chromosome rearrangements were implicated in the process of reproductive isolation; however, the chromosomal speciation model has recently been questioned. In addition, recent data from hybrid model systems indicates that simple epistatic interactions, the Dobzhansky-Muller incompatibilities, are more complex. In fact, incompatibilities are quite broad, including interactions among heterochromatin, small RNAs, and distinct, epigenetically defined genomic regions such as the centromere. In this review, we will examine both classical and current models of chromosomal speciation and describe the "evolving" theory of genetic conflict, epigenetics, and chromosomal speciation.
Collapse
Affiliation(s)
- Judith D Brown
- Department of Allied Health Sciences, University of Connecticut, Storrs, CT 06269, USA
| | | |
Collapse
|
46
|
Melián CJ, Alonso D, Vázquez DP, Regetz J, Allesina S. Frequency-dependent selection predicts patterns of radiations and biodiversity. PLoS Comput Biol 2010; 6. [PMID: 20865126 PMCID: PMC2928887 DOI: 10.1371/journal.pcbi.1000892] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2010] [Accepted: 07/16/2010] [Indexed: 11/19/2022] Open
Abstract
Most empirical studies support a decline in speciation rates through time, although evidence for constant speciation rates also exists. Declining rates have been explained by invoking pre-existing niches, whereas constant rates have been attributed to non-adaptive processes such as sexual selection and mutation. Trends in speciation rate and the processes underlying it remain unclear, representing a critical information gap in understanding patterns of global diversity. Here we show that the temporal trend in the speciation rate can also be explained by frequency-dependent selection. We construct a frequency-dependent and DNA sequence-based model of speciation. We compare our model to empirical diversity patterns observed for cichlid fish and Darwin's finches, two classic systems for which speciation rates and richness data exist. Negative frequency-dependent selection predicts well both the declining speciation rate found in cichlid fish and explains their species richness. For groups like the Darwin's finches, in which speciation rates are constant and diversity is lower, speciation rate is better explained by a model without frequency-dependent selection. Our analysis shows that differences in diversity may be driven by incipient species abundance with frequency-dependent selection. Our results demonstrate that genetic-distance-based speciation and frequency-dependent selection are sufficient to explain the high diversity observed in natural systems and, importantly, predict decay through time in speciation rate in the absence of pre-existing niches. Ecological opportunity, or filling a pre-existing unoccupied adaptive zone, is considered the dominant mechanism explaining the initial explosion of diversity. Although this type of niche filling can explain rates of diversification in some lineages, it is not sufficient for a radiation to occur. Instead of attributing the propensity to have an explosion of new species to external influences like niche availability, an alternative hypothesis can be based in frequency-dependent selection driven by the ecology in which organisms are embedded or endogenous sources mediated by gametes during fertilization. We show that genome diversification driven by higher reproductive probability of rare genotypes generates rapid initial speciation followed by a plateau with very low speciation rates, as shown by most empirical data. The absence of advantage of rare genotypes generates speciation events at constant rates. We predict decline over time and constant speciation rate in the cichlids and Darwin's finches, respectively, thus providing an alternative hypothesis for the origin of radiations and biodiversity in the absence of pre-existing niche filling. In addition to predicting observed temporal trends in diversification, our analysis also highlights new mechanistic models of evolutionary biodiversity dynamics that may become suitable to generate neutral models for testing observed patterns in speciation rates and species diversity.
Collapse
Affiliation(s)
- Carlos J Melián
- National Center for Ecological Analysis and Synthesis, University of California, Santa Barbara, Santa Barbara, California, United States of America.
| | | | | | | | | |
Collapse
|
47
|
progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 2010; 5:e11147. [PMID: 20593022 PMCID: PMC2892488 DOI: 10.1371/journal.pone.0011147] [Citation(s) in RCA: 2976] [Impact Index Per Article: 198.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2010] [Accepted: 05/24/2010] [Indexed: 11/21/2022] Open
Abstract
Background Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. Methodology/Principal Findings We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence. Conclusions The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve.
Collapse
|
48
|
Genomes as documents of evolutionary history. Trends Ecol Evol 2010; 25:224-32. [DOI: 10.1016/j.tree.2009.09.007] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2009] [Revised: 09/18/2009] [Accepted: 09/21/2009] [Indexed: 02/02/2023]
|
49
|
Lajoie M, Bertrand D, El-Mabrouk N. Inferring the evolutionary history of gene clusters from phylogenetic and gene order data. Mol Biol Evol 2009; 27:761-72. [PMID: 19903657 DOI: 10.1093/molbev/msp271] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Gene duplication is frequent within gene clusters and plays a fundamental role in evolution by providing a source of new genetic material upon which natural selection can act. Although classical phylogenetic inference methods provide some insight into the evolutionary history of a gene cluster, they are not sufficient alone to differentiate single- from multiple gene duplication events and to answer other questions regarding the nature and size of evolutionary events. In this paper, we present an algorithm allowing to infer a set of optimal evolutionary histories for a gene cluster in a single species, according to a general cost model involving variable length duplications (in tandem or inverted), deletions, and inversions. We applied our algorithm to the human olfactory receptor and protocadherin gene clusters, showing that the duplication size distribution differs significantly between the two gene families. The algorithm is available through a web interface at http://www-lbit.iro.umontreal.ca/DILTAG/.
Collapse
Affiliation(s)
- Mathieu Lajoie
- Département d'informatique et de recherche opérationnelle Université de Montréal, Montréal, Canada.
| | | | | |
Collapse
|
50
|
Alekseyev MA, Pevzner PA. Breakpoint graphs and ancestral genome reconstructions. Genes Dev 2009; 19:943-57. [PMID: 19218533 PMCID: PMC2675983 DOI: 10.1101/gr.082784.108] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Accepted: 01/22/2009] [Indexed: 11/24/2022]
Abstract
Recently completed whole-genome sequencing projects marked the transition from gene-based phylogenetic studies to phylogenomics analysis of entire genomes. We developed an algorithm MGRA for reconstructing ancestral genomes and used it to study the rearrangement history of seven mammalian genomes: human, chimpanzee, macaque, mouse, rat, dog, and opossum. MGRA relies on the notion of the multiple breakpoint graphs to overcome some limitations of the existing approaches to ancestral genome reconstructions. MGRA also generates the rearrangement-based characters guiding the phylogenetic tree reconstruction when the phylogeny is unknown.
Collapse
Affiliation(s)
- Max A. Alekseyev
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0404, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0404, USA
| |
Collapse
|