1
|
Greenman CD, Penso-Dolfin L, Wu T. The complexity of genome rearrangement combinatorics under the infinite sites model. J Theor Biol 2020; 501:110335. [DOI: 10.1016/j.jtbi.2020.110335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Revised: 04/16/2020] [Accepted: 05/14/2020] [Indexed: 11/30/2022]
|
2
|
Dong J, Qi M, Wang S, Yuan X. DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads. Front Genet 2020; 11:924. [PMID: 32849857 PMCID: PMC7433346 DOI: 10.3389/fgene.2020.00924] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 07/24/2020] [Indexed: 11/21/2022] Open
Abstract
Tandem duplication (TD) is an important type of structural variation (SV) in the human genome and has biological significance for human cancer evolution and tumor genesis. Accurate and reliable detection of TDs plays an important role in advancing early detection, diagnosis, and treatment of disease. The advent of next-generation sequencing technologies has made it possible for the study of TDs. However, detection is still challenging due to the uneven distribution of reads and the uncertain amplitude of TD regions. In this paper, we present a new method, DINTD (Detection and INference of Tandem Duplications), to detect and infer TDs using short sequencing reads. The major principle of the proposed method is that it first extracts read depth and mapping quality signals, then uses the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm to find the possible TD regions. The total variation penalized least squares model is fitted with read depth and mapping quality signals to denoise signals. A 2D binary search tree is used to search the neighbor points effectively. To further identify the exact breakpoints of the TD regions, split-read signals are integrated into DINTD. The experimental results of DINTD on simulated data sets showed that DINTD can outperform other methods for sensitivity, precision, F1-score, and boundary bias. DINTD is further validated on real samples, and the experiment results indicate that it is consistent with other methods. This study indicates that DINTD can be used as an effective tool for detecting TDs.
Collapse
Affiliation(s)
- Jinxin Dong
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Minyong Qi
- School of Computer Science and Technology, Xidian University, Xi'an, China.,School of Computer Science and Technology, Liaocheng University, Liaocheng, China
| | - Shaoqiang Wang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
3
|
Abstract
BACKGROUND The evolutionary origin of gastrulation--defined as a morphogenetic event that leads to the establishment of germ layers--remains a vexing question. Central to this debate is the evolutionary relationship between the cell layers of sponges (poriferans) and eumetazoan germ layers. Despite considerable attention, it remains unclear whether sponge cell layers undergo progressive fate determination akin to eumetazoan primary germ layer formation during gastrulation. RESULTS Here we show by cell-labelling experiments in the demosponge Amphimedon queenslandica that the cell layers established during embryogenesis have no relationship to the cell layers of the juvenile. In addition, juvenile epithelial cells can transdifferentiate into a range of cell types and move between cell layers. Despite the apparent lack of cell layer and fate determination and stability in this sponge, the transcription factor GATA, a highly conserved eumetazoan endomesodermal marker, is expressed consistently in the inner layer of A. queenslandica larvae and juveniles. CONCLUSIONS Our results are compatible with sponge cell layers not undergoing progressive fate determination and thus not being homologous to eumetazoan germ layers. Nonetheless, the expression of GATA in the sponge inner cell layer suggests a shared ancestry with the eumetazoan endomesoderm, and that the ancestral role of GATA in specifying internalised cells may antedate the origin of germ layers. Together, these results support germ layers and gastrulation evolving early in eumetazoan evolution from pre-existing developmental programs used for the simple patterning of cells in the first multicellular animals.
Collapse
Affiliation(s)
- Nagayasu Nakanishi
- School of Biological Sciences, University of Queensland, Brisbane, QLD 4072, Australia
| | - Shunsuke Sogabe
- School of Biological Sciences, University of Queensland, Brisbane, QLD 4072, Australia
| | - Bernard M Degnan
- School of Biological Sciences, University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
4
|
Pradel N, Bartoli M, Bernadac A, Gimenez G, Ollivier B, Fardeau ML. Isolation of Thermovenabulum gondwanense from a French hot spring and emended description of the species. Antonie Van Leeuwenhoek 2013; 104:271-9. [PMID: 23743634 DOI: 10.1007/s10482-013-9947-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Accepted: 05/30/2013] [Indexed: 10/26/2022]
Abstract
An anaerobic thermophilic bacterium designated CA9F1 was isolated from a thermal spring in France. Strain CA9F1 was observed to grow at temperatures between 55 and 70 °C (optimum 65 °C) and at pH between 6.8 and 9.5 (optimum pH 7.4). Strain CA9F1 does not require salt for growth (0-10 g l(-1) NaCl), with an optimum at 1 g l(-1). The DNA G+C content was determined to be 38.5 mol% (Tm). The major cellular fatty acids identified were C15:0, C16:0, C17:0 iso. Based on phenotypic, chemotaxonomic and genotypic properties, strain CA9F1 was identified as Thermovenabulum gondwanense and this species was studied in more detail. Strain CA9F1 is a Gram-positive bacterium which forms a complex and regular multilayered cell wall structure, here characterised as being due to the presence of an S-layer. The network covers the entire cell surface and forms a hexagonal structure resembling that observed for Deinococcus radiodurans. The main protein component of the S-layer possesses domains comparable to that of the S-layer protein of Halothermothrix orenii. The characteristics of the strain were compared to that of T. gondwanese R270(T) isolated from microbial mats thriving in the thermal waters of a Great Artesian Basin bore runoff channel at 66 °C, in Australia. Significant differences were observed between CA9F1 and the type strain. One of the major physiological differences is the inability of CA9F1 to reduce Fe(III). An emended description of T. gondwanense is given.
Collapse
Affiliation(s)
- Nathalie Pradel
- Aix-Marseille Université, Université du Sud Toulon-Var, CNRS/INSU, IRD, MIO, UM 110, 13288, Marseille Cedex 09, France
| | | | | | | | | | | |
Collapse
|
5
|
Abouelhoda MI, Giegerich R, Behzadi B, Steyaert JM. Alignment of minisatellite maps based on run-length encoding scheme. J Bioinform Comput Biol 2009; 7:287-308. [PMID: 19340916 DOI: 10.1142/s0219720009004060] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2008] [Revised: 09/27/2008] [Accepted: 10/23/2008] [Indexed: 11/18/2022]
Abstract
Subsequent duplication events are responsible for the evolution of the minisatellite maps. Alignment of two minisatellite maps should therefore take these duplication events into account, in addition to the well-known edit operations. All algorithms for computing an optimal alignment of two maps, including the one presented here, first deduce the costs of optimal duplication scenarios for all substrings of the given maps. Then, they incorporate the pre-computed costs in the alignment recurrence. However, all previous algorithms addressing this problem are dependent on the number of distinct map units (map alphabet) and do not fully make use of the repetitiveness of the map units. In this paper, we present an algorithm that remedies these shortcomings: our algorithm is alphabet-independent and is based on the run-length encoding scheme. It is the fastest in theory, and in practice as well, as shown by experimental results. Furthermore, our alignment model is more general than that of the previous algorithms, and captures better the duplication mechanism. Using our algorithm, we derive a quantitative evidence that there is a directional bias in the growth of minisatellites of the MSY1 dataset.
Collapse
|
6
|
Abstract
A number of biological processes can lead to genes being copied within the genome of some given species. Duplicate genes of this form are called paralogs and such genes share a high degree sequence similarity as well as often having closely related functions. Some genes have become widely duplicated to form multigene families in which the copies are distributed both within the genomes of individual species and across different species. Statistical modelling of gene duplication and the evolution of multi-gene families currently lags behind well-established models of DNA sequence evolution despite an increasing volume of available data, but the analysis of multi-gene families is important as part of a wider effort to understand evolution at the genomic level. This article reviews existing approaches to modelling multi-gene families and presents various challenges and possibilities for this exciting area of research.
Collapse
Affiliation(s)
- Tom M W Nye
- School of Mathematics and Statistics, Newcastle University, Newcastle, UK.
| |
Collapse
|
7
|
Abstract
The Hox genes encode transcription factors that play vital roles in the anterior-posterior patterning of all bilaterian phyla studied to date. Additionally, the gain of Hox genes by duplication has been widely implicated as a driving force in the evolution of animal body plans. Because of this, reconstructing the evolution of the Hox cluster has been the focus of intense research interest. It has been commonly assumed that an ancestral four-gene ProtoHox cluster was duplicated early in animal evolution to give rise to the Hox and ParaHox clusters. However, this hypothesis has recently been called into question, and a number of alternative hypotheses of Hox and ParaHox gene evolution have been proposed. Here, we present the first statistical comparisons of current hypotheses of Hox and ParaHox gene evolution. We use two statistical methods that represent two different approaches to the treatment of phylogenetic uncertainty. In the first method, we estimate the maximum-likelihood tree for each hypothesis and compare these trees to one another using a parametric bootstrapping approach. In the second method, we use Bayesian phylogenetics to estimate the posterior distribution of trees, then we calculate the support for each hypothesis from this distribution. The results of both methods are largely congruent. We find that we are able to reject five out of the eight current hypotheses of Hox and ParaHox gene evolution that we consider. We conclude that the ProtoHox cluster is likely to have contained either three or four genes but that there is insufficient phylogenetic signal in the homeodomains to distinguish between these alternatives.
Collapse
Affiliation(s)
- Robert Lanfear
- Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Brighton, UK.
| | | |
Collapse
|
8
|
Abstract
Tandemly arrayed genes (TAG) constitute a large fraction of most genomes and play important biological roles. They evolve through unequal recombination, which places duplicated genes next to the original ones (tandem duplications). Many algorithms have been proposed to infer a tandem duplication history for a TAG cluster. However, the presence of different transcriptional orientations in many clusters highlights the fact that processes such as inversions also contribute to their evolution. Moreover, existing algorithms are restricted to the study of TAGs evolution in a single species (only paralogous genes are considered). To circumvent these limitations, we consider an evolutionary model for TAGs involving duplication, gene loss, inversion, and speciation events. A general framework to infer ancestral gene orders that minimize the number of inversions in the whole evolutionary history is presented. At the methodological level, this paper integrates three approaches to genome evolution: the duplication tree reconstruction, the gene tree/species tree reconciliation theory, and the concept of inversion median used in order-based phylogeny reconstruction. An application on a cluster of olfactory receptor genes in four mammals is presented.
Collapse
|
9
|
Abstract
Given a phylogenetic tree for a family of tandemly repeated genes and their signed order on the chromosome, we aim to find the minimum number of inversions compatible with an evolutionary history of this family. This is the first attempt to account for inversions in an evolutionary model of tandemly repeated genes. We present a branch-and-bound algorithm that finds the exact solution, and a polynomial-time heuristic based on the breakpoint distance. We show, on simulated data, that those algorithms can be used to improve phylogenetic inference of tandemly repeated gene families. An application on a published phylogeny of KRAB zinc finger genes is presented.
Collapse
Affiliation(s)
- Mathieu Lajoie
- DIRO, Université de Montréal, Montréal H3C 3J7, QC, Canada.
| | | | | | | |
Collapse
|
10
|
Abstract
BACKGROUND The shape of phylogenetic trees has been used to make inferences about the evolutionary process by comparing the shapes of actual phylogenies with those expected under simple models of the speciation process. Previous studies have focused on speciation events, but gene duplication is another lineage splitting event, analogous to speciation, and gene loss or deletion is analogous to extinction. Measures of the shape of gene family phylogenies can thus be used to investigate the processes of gene duplication and loss. We make the first systematic attempt to use tree shape to study gene duplication using human gene phylogenies. RESULTS We find that gene duplication has produced gene family trees significantly less balanced than expected from a simple model of the process, and less balanced than species phylogenies: the opposite to what might be expected under the 2R hypothesis. CONCLUSION While other explanations are plausible, we suggest that the greater imbalance of gene family trees than species trees is due to the prevalence of tandem duplications over regional duplications during the evolution of the human genome.
Collapse
Affiliation(s)
- James A Cotton
- Division of Environmental and Evolutionary Biology, Institute of Biomedical and Life Sciences, University of Glasgow, Glasgow, UK
- Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, County Kildare, Ireland
| | - Roderic DM Page
- Division of Environmental and Evolutionary Biology, Institute of Biomedical and Life Sciences, University of Glasgow, Glasgow, UK
| |
Collapse
|
11
|
Elemento O, Gascuel O. An exact and polynomial distance-based algorithm to reconstruct single copy tandem duplication trees. ACTA ACUST UNITED AC 2005; 3:362-74. [DOI: 10.1016/j.jda.2004.08.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Abstract
The problem of reconstructing the duplication history of a set of tandemly repeated sequences was first introduced by Fitch . Many recent studies deal with this problem, showing the validity of the unequal recombination model proposed by Fitch, describing numerous inference algorithms, and exploring the combinatorial properties of these new mathematical objects, which are duplication trees. In this paper, we deal with the topological rearrangement of these trees. Classical rearrangements used in phylogeny (NNI, SPR, TBR, ...) cannot be applied directly on duplication trees. We show that restricting the neighborhood defined by the SPR (Subtree Pruning and Regrafting) rearrangement to valid duplication trees, allows exploring the whole duplication tree space. We use these restricted rearrangements in a local search method which improves an initial tree via successive rearrangements. This method is applied to the optimization of parsimony and minimum evolution criteria. We show through simulations that this method improves all existing programs for both reconstructing the topology of the true tree and recovering its duplication events. We apply this approach to tandemly repeated human Zinc finger genes and observe that a much better duplication tree is obtained by our method than using any other program.
Collapse
Affiliation(s)
- Denis Bertrand
- Projet Méthodes et Algorithmes pour la Bioinformatique, LIRMM (UMR 5506, CNRS-Univ. Montpellier 2), 161 rue Ada, 34392 Montpellier 5, France
| | | |
Collapse
|