1
|
Chahar N, Dangwal M, Das S. Complex origin, evolution, and diversification of non-canonically organized OVATE-OFP and OVATE-Like OFP gene pair across Embryophyta. Gene 2023; 883:147685. [PMID: 37536399 DOI: 10.1016/j.gene.2023.147685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 07/21/2023] [Accepted: 07/31/2023] [Indexed: 08/05/2023]
Abstract
Ovate Family Proteins (OFP) is a plant-specific gene family of negative transcriptional regulators. Till-date, a handful of in-silico studies have provided glimpses into family size, expansion patterns, and genic features across all major plant lineages. A major lacuna exists in understanding origin of organisation complexity of members such as those arranged in a head-to-head manner which may lead to transcriptional co-regulation via a common bi-directional promoter. To address this gap, we investigated the origin, organization and evolution of two head-to-head arranged gene pairs of homologs of AtOFP2-AtOFP17, and, AtOFP4-AtOFP20 across Archaeplastida. The ancestral forms of AtOFP2, AtOFP4, AtOFP17, and AtOFP20 are likely to have evolved in last common ancestors of Embryophyta (land plants) given their complete absence in Rhodophyta and Chlorophyta. The OFP gene family originated and expanded in Bryophyta, including protein variants with complete (OVATE-OFP) or partial (OVATE-Like OFP) OVATE domain; with head-to-head organization present only in Spermatophyta (gymnosperms and angiosperms). Ancestral State Reconstruction revealed the origin of head-to-head organized gene pair in gymnosperms, with both genes being OVATE-OFP (homologs of AtOFP2/4). Phylogenetic reconstruction and copy number analysis suggests the presence of a single copy of the head-to-head arranged pair of OFP2/4 (OVATE)-OFP17/20 (OVATE-Like) in all angiosperms except Brassicaceae, and a duplication event in last common ancestor of core Brassicaceae approximately 32-54 MYA leading to origin of AtOFP2-AtOFP17 and AtOFP4-AtOFP20 as paralogs. Synteny analysis of genomic regions harbouring homologs of AtOFP2-AtOFP17, AtOFP4-AtOFP20 and AtOFP2/4-AtOFP17/20 across angiosperms suggested ancestral nature of AtOFP2-AtOFP17 gene pair. The present study thus establishes the orthology and evolutionary history of two non-canonically organised gene pairs with variation in their OVATE domain. The non-canonical organisation, atleast in Brassicaceae, has the potential of generating complex transcriptional regulation mediated via a common bi-directional promoter. The study thus lays down a framework to understand evolution of gene and protein structure, transcriptional regulation and function across a phylogenetic lineage through comparative analyses.
Collapse
Affiliation(s)
- Nishu Chahar
- Department of Botany, University of Delhi, Delhi 110 007, India.
| | | | - Sandip Das
- Department of Botany, University of Delhi, Delhi 110 007, India.
| |
Collapse
|
2
|
Anand S, Lal M, Bhardwaj E, Shukla R, Pokhriyal E, Jain A, Sri T, Srivastava PS, Singh A, Das S. MIR159 regulates multiple aspects of stamen and carpel development and requires dissection and delimitation of differential downstream regulatory network for manipulating fertility traits. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2023; 29:1437-1456. [PMID: 38076769 PMCID: PMC10709278 DOI: 10.1007/s12298-023-01377-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/21/2023] [Accepted: 10/18/2023] [Indexed: 12/17/2023]
Abstract
Unravelling genetic networks regulating developmental programs are key to devising and implementing genomics assisted trait modification strategies. It is crucial to understand the role of small RNAs, and the basis of their ability to modify traits. MIR159 has been previously reported to cause defects in anther development in Arabidopsis; however, the complete spectrum and basis of the defects remained unclear. The present study was therefore undertaken to comprehensively investigate the role of miR159 from Brassica juncea in modulating vegetative and reproductive traits. Owing to the polyploid nature of Brassica, paralogous and homeologous copies of MIR159A, MIR159B, and, MIR159C were identified and analysis of the precursor uncovered extensive structural and sequence variation. The MIR159 locus with mature miR159 with perfect target complimentarily with MYB65, was cloned from Brassica juncea var. Varuna for functional characterization by generating constitutively over-expressing lines in Arabidopsis thaliana Col-0. Apart from statistically significant difference in multiple vegetative traits, drastic differences were observed in stamen and pistil. Over-expression of miR159a led to shortening of filament length and loss of tetradynamous condition. Anthers were apiculate, with improper lobe formation, and unsynchronized cellular growth between connective tissue and another lobe development. Analysis revealed arrested meiosis/cytokinesis in microspores, and altered lignin deposition pattern in endothecial walls thus affecting anther dehiscence. In the gynoecium, flaccid, dry stigmatic papillae, and large embryo sac in the female gametophyte was observed. Over-expression of miR159a thus severely affected pollination and seed-set. Analysis of the transcriptome data revealed components of regulatory networks of anther and carpel developmental pathway, and lignin metabolism that are affected. Expression analysis allowed us to position the miR159a-MYB65 module in the genetic network of stamen development, involved in pollen-grain maturation; in GA-mediated regulation of stamen development, and in lignin metabolism. The study, on one hand indicates role of miR159a-MYB65 in regulating multiple aspects of reproductive organ development that can be manipulated for trait modification, but also raises several unaddressed questions such as relationship between miR159a and male-meiosis, miR159a and filament elongation for future investigations. Accession numbers: KC204951-KC204960. Project number PRJNA1035268. Supplementary Information The online version contains supplementary material available at 10.1007/s12298-023-01377-7.
Collapse
Affiliation(s)
- Saurabh Anand
- Department of Botany, University of Delhi, Delhi, 110 007 India
| | - Mukund Lal
- Department of Botany, University of Delhi, Delhi, 110 007 India
| | - Ekta Bhardwaj
- Department of Botany, University of Delhi, Delhi, 110 007 India
| | - Richa Shukla
- Department of Botany, University of Delhi, Delhi, 110 007 India
| | - Ekta Pokhriyal
- Department of Botany, University of Delhi, Delhi, 110 007 India
| | - Aditi Jain
- Department of Botany, University of Delhi, Delhi, 110 007 India
| | - Tanu Sri
- TERI School of Advanced Studies, Plot No. 10, Institutional Area, Vasant Kunj, New Delhi, 110 070 India
| | - P. S. Srivastava
- Department of Biotechnology, Jamia Hamdard, Hamdard Nagar, New Delhi, Delhi 110 062 India
| | - Anandita Singh
- TERI School of Advanced Studies, Plot No. 10, Institutional Area, Vasant Kunj, New Delhi, 110 070 India
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, 110 007 India
| |
Collapse
|
3
|
Katiyar A, Geeta R, Das S, Mudgil Y. Comparative genomics, microsynteny, ancestral state reconstruction and selection pressure analysis across distinctive genomes and sub-genomes of Brassicaceae for analysis of evolutionary history of VQ gene family. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2023; 29:1505-1523. [PMID: 38076762 PMCID: PMC10709281 DOI: 10.1007/s12298-023-01347-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 07/19/2023] [Accepted: 08/11/2023] [Indexed: 10/04/2024]
Abstract
Any unfavorable condition that affects the metabolism, growth, or development of plants is considered plant stress. The molecular response of plants towards abiotic stresses involves signaling to cellular components, repressing transcription factors, and subsequently induced metabolic changes. Most valine-glutamine (VQ) motif-containing genes in plants encode regulatory proteins that interact with transcription factors and modulate their activity as transcription regulators. Several VQ proteins regulate plant development and stress responses. In spite of the functional importance of VQs, there is relatively little information about their evolutionary history in Brassicaceae or beyond. Brassicaceae is characterized by paleoploidy, mesopolyploidy, and neopolyploidy, offering a resource for studying evolution and diversification. In current study we performed phylogeny of the VQ gene family along with comparative genomics, microsynteny and evolutionary rates analysis across seven species of Brassicaceae. Our findings revealed the following; (1) a large segmental duplication in the shared common ancestor of the family Brassicaceae, resulted in paralogies of VQ1-VQ10, VQ15-VQ24, VQ16-VQ23, VQ17-VQ25, VQ18-VQ26, VQ22-VQ27; (2) chromosomal mapping revealed diverse distributions of the gene family; (3) duplicated segments undergo varying degrees of retention and loss; and (4) Out of the 12 paralogous members, most of the genes are under purifying selection. However, VQ23 in Brassicaceae stands out as it is under positive selection, indicating the need for further investigation. Overall, our results clearly establish that the ancestral VQ1/VQ10, VQ15/VQ24, VQ16/VQ23, VQ17/VQ25, VQ18/VQ26, VQ22/VQ27 genes duplicated in shared common ancestor of Brassicaceae. Supplementary Information The online version contains supplementary material available at 10.1007/s12298-023-01347-z.
Collapse
Affiliation(s)
- Arpana Katiyar
- Department of Botany, University of Delhi, New Delhi, 110007 India
| | - R. Geeta
- Department of Botany, University of Delhi, New Delhi, 110007 India
| | - Sandip Das
- Department of Botany, University of Delhi, New Delhi, 110007 India
| | - Yashwanti Mudgil
- Department of Botany, University of Delhi, New Delhi, 110007 India
| |
Collapse
|
4
|
Xu MM, Gu LH, Lv WY, Duan SC, Li LW, Du Y, Lu LZ, Zeng T, Hou ZC, Ma ZS, Chen W, Adeola AC, Han JL, Xu TS, Dong Y, Zhang YP, Peng MS. Chromosome-level genome assembly of the Muscovy duck provides insight into fatty liver susceptibility. Genomics 2022; 114:110518. [PMID: 36347326 DOI: 10.1016/j.ygeno.2022.110518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 11/01/2022] [Accepted: 11/04/2022] [Indexed: 11/07/2022]
Abstract
The Muscovy duck (Cairina moschata) is an economically important poultry species, which is susceptible to fatty liver. Thus, the Muscovy duck may serve as an excellent candidate animal model of non-alcoholic fatty liver disease. However, the mechanisms underlying fatty liver development in this species are poorly understood. In this study, we report a chromosome-level genome assembly of the Muscovy duck, with a contig N50 of 11.8 Mb and scaffold N50 of 83.16 Mb. The susceptibility of Muscovy duck to fatty liver was mainly attributed to weak lipid catabolism capabilities (fatty acid β-oxidation and lipolysis). Furthermore, conserved noncoding elements (CNEs) showing accelerated evolution contributed to fatty liver formation by down-regulating the expression of genes involved in hepatic lipid catabolism. We propose that the susceptibility of Muscovy duck to fatty liver is an evolutionary by-product. In conclusion, this study revealed the potential mechanisms underlying the susceptibility of Muscovy duck to fatty liver.
Collapse
Affiliation(s)
- Ming-Min Xu
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, China
| | - Li-Hong Gu
- Institute of Animal Science & Veterinary Medicine, Hainan Academy of Agricultural Sciences, Haikou 571100, China
| | - Wan-Yue Lv
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming 650091, China
| | | | - Lian-Wei Li
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, China; Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Yuan Du
- Nowbio Biotechnology Company, Kunming 650201, China
| | - Li-Zhi Lu
- Institute of Animal Husbandry and Veterinary Science, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China
| | - Tao Zeng
- Institute of Animal Husbandry and Veterinary Science, Zhejiang Academy of Agricultural Sciences, Hangzhou 310021, China
| | - Zhuo-Cheng Hou
- National Engineering Laboratory for Animal Breeding and Key Laboratory of Animal Genetics, Breeding and Reproduction, MARA; College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Zhanshan Sam Ma
- Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, China; Computational Biology and Medical Ecology Lab, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Wei Chen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China; Key Laboratory for Agro-Biodiversity and Pest Control of Ministry of Education, Yunnan Agricultural University, Kunming 650201, China
| | - Adeniyi C Adeola
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Jian-Lin Han
- CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China; Livestock Genetics Program, International Livestock Research Institute (ILRI), Nairobi 00100, Kenya
| | - Tie-Shan Xu
- Tropical Crops Genetic Resources Research Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China.
| | - Yang Dong
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan Agricultural University, Kunming 650201, China; Key Laboratory for Agro-Biodiversity and Pest Control of Ministry of Education, Yunnan Agricultural University, Kunming 650201, China.
| | - Ya-Ping Zhang
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, China; State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming 650091, China; KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.
| | - Min-Sheng Peng
- State Key Laboratory of Genetic Resources and Evolution & Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, China; KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.
| |
Collapse
|
5
|
Jain C, Gibney D, Thankachan SV. Algorithms for Colinear Chaining with Overlaps and Gap Costs. J Comput Biol 2022; 29:1237-1251. [PMID: 36351202 DOI: 10.1089/cmb.2022.0266] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Colinear chaining has proven to be a powerful heuristic for finding near-optimal alignments of long DNA sequences (e.g., long reads or a genome assembly) to a reference. It is used as an intermediate step in several alignment tools that employ a seed-chain-extend strategy. Despite this popularity, efficient subquadratic time algorithms for the general case where chains support anchor overlaps and gap costs are not currently known. We present algorithms to solve the colinear chaining problem with anchor overlaps and gap costs in Õ(n) time, where n denotes the count of anchors. The degree of the polylogarithmic factor depends on the type of anchors used (e.g., fixed-length anchors) and the type of precedence an optimal anchor chain is required to satisfy. We also establish the first theoretical connection between colinear chaining cost and edit distance. Specifically, we prove that for a fixed set of anchors under a carefully designed chaining cost function, the optimal "anchored" edit distance equals the optimal colinear chaining cost. The anchored edit distance for two sequences and a set of anchors is only a slight generalization of the standard edit distance. It adds an additional cost of one to an alignment of two matching symbols that are not supported by any anchor. Finally, we demonstrate experimentally that optimal colinear chaining cost under the proposed cost function can be computed orders of magnitude faster than edit distance, and achieves correlation coefficient >0.9 with edit distance for closely as well as distantly related sequences.
Collapse
Affiliation(s)
- Chirag Jain
- Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, India
| | - Daniel Gibney
- School of Computational Science and Engineering, Georgia Institute of Technology Atlanta, Georgia, USA
| | - Sharma V. Thankachan
- Department of Computer Science, University of Central Florida, Orlando, Florida, USA
| |
Collapse
|
6
|
Billakurthi K, Schulze S, Schulz ELM, Sage TL, Schreier TB, Hibberd JM, Ludwig M, Westhoff P. Shedding light on AT1G29480 of Arabidopsis thaliana-An enigmatic locus restricted to Brassicacean genomes. PLANT DIRECT 2022; 6:e455. [PMID: 36263108 PMCID: PMC9576117 DOI: 10.1002/pld3.455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 09/02/2022] [Accepted: 09/20/2022] [Indexed: 06/16/2023]
Abstract
A key feature of C4 Kranz anatomy is the presence of an enlarged, photosynthetically highly active bundle sheath whose cells contain large numbers of chloroplasts. With the aim to identify novel candidate regulators of C4 bundle sheath development, we performed an activation tagging screen with Arabidopsis thaliana. The reporter gene used encoded a chloroplast-targeted GFP protein preferentially expressed in the bundle sheath, and the promoter of the C4 phosphoenolpyruvate carboxylase gene from Flaveria trinervia served as activation tag because of its activity in all chlorenchymatous tissues of A. thaliana. Primary mutants were selected based on their GFP signal intensity, and one stable mutant named kb-1 with a significant increase in GFP fluorescence intensity was obtained. Despite the increased GFP signal, kb-1 showed no alterations to bundle sheath anatomy. The causal locus, AT1G29480, is specific to the Brassicaceae with its second exon being conserved. Overexpression and reconstitution studies confirmed that AT1G29480, and specifically its second exon, were sufficient for the enhanced GFP phenotype, which was not dependent on translation of the locus or its parts into protein. We conclude, therefore, that the AT1G29480 locus enhances the GFP reporter gene activity via an RNA-based mechanism.
Collapse
Affiliation(s)
- Kumari Billakurthi
- Institute of Plant Molecular and Developmental BiologyUniversitätsstrasse 1, Heinrich‐Heine‐UniversityDuesseldorfGermany
- Cluster of Excellence on Plant Sciences ‘From Complex Traits Towards Synthetic Modules’Düsseldorf‐CologneGermany
- Department of Plant Sciences, Downing StreetUniversity of CambridgeCambridgeUK
| | - Stefanie Schulze
- Institute of Plant Molecular and Developmental BiologyUniversitätsstrasse 1, Heinrich‐Heine‐UniversityDuesseldorfGermany
| | - Eva Lena Marie Schulz
- Institute of Plant Molecular and Developmental BiologyUniversitätsstrasse 1, Heinrich‐Heine‐UniversityDuesseldorfGermany
| | - Tammy L. Sage
- Department of Ecology and Evolutionary BiologyThe University of TorontoTorontoOntarioCanada
| | - Tina B. Schreier
- Department of Plant Sciences, Downing StreetUniversity of CambridgeCambridgeUK
| | - Julian M. Hibberd
- Department of Plant Sciences, Downing StreetUniversity of CambridgeCambridgeUK
| | - Martha Ludwig
- School of Molecular SciencesUniversity of Western AustraliaPerthWestern AustraliaAustralia
| | - Peter Westhoff
- Institute of Plant Molecular and Developmental BiologyUniversitätsstrasse 1, Heinrich‐Heine‐UniversityDuesseldorfGermany
- Cluster of Excellence on Plant Sciences ‘From Complex Traits Towards Synthetic Modules’Düsseldorf‐CologneGermany
| |
Collapse
|
7
|
Kille B, Balaji A, Sedlazeck FJ, Nute M, Treangen TJ. Multiple genome alignment in the telomere-to-telomere assembly era. Genome Biol 2022; 23:182. [PMID: 36038949 PMCID: PMC9421119 DOI: 10.1186/s13059-022-02735-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 07/21/2022] [Indexed: 01/22/2023] Open
Abstract
With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights.
Collapse
Affiliation(s)
- Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Michael Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
8
|
Lal M, Bhardwaj E, Chahar N, Yadav S, Das S. Comprehensive analysis of 1R- and 2R-MYBs reveals novel genic and protein features, complex organisation, selective expansion and insights into evolutionary tendencies. Funct Integr Genomics 2022; 22:371-405. [PMID: 35260976 DOI: 10.1007/s10142-022-00836-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 02/10/2022] [Accepted: 02/23/2022] [Indexed: 11/28/2022]
Abstract
Myeloblastosis (MYB) family, the largest plant transcription factor family, has been subcategorised based on the number and type of repeats in the MYB domain. In spite of several reports, evolution of MYB genes and repeats remains enigmatic. Brassicaceae members are endowed with complex genomes, including dysploidy because of its unique history with multiple rounds of polyploidisation, genomic fractionations and rearrangements. The present study is an attempt to gain insights into the complexities of MYB family diversity, understand impacts of genome evolution on gene families and develop an evolutionary framework to understand the origin of various subcategories of MYB gene family. We identified and analysed 1129 MYBs that included 1R-, 2R-, 3R- and atypical-MYBs across sixteen species representing protists, fungi, animals and plants and exclude MYB identified from Brassicaceae except Arabidopsis thaliana; in addition, a total of 1137 2R-MYB genes from six Brassicaceae species were also analysed. Comparative analysis revealed predominance of 1R-MYBs in protists, fungi, animals and lower plants. Phylogenetic reconstruction and analysis of selection pressure suggested ancestral nature of R1-type repeat containing 1R-MYBs that might have undergone intragenic duplication to form multi-repeat MYBs. Distinct differences in gene structure between 1R-MYB and 2R-MYBs were observed regarding intron number, the ratio of gene length to coding DNA sequence (CDS) length and the length of exons encoding the MYB domain. Conserved as well as novel and lineage-specific intron phases were identified. Analyses of physicochemical properties revealed drastic differences indicating functional diversification in MYBs. Phylogenetic reconstruction of 1R- and 2R-MYB genes revealed a shared structure-function relationship in clades which was supported when transcriptome data was analysed in silico. Comparative genomics to study distribution pattern and mapping of 2R-MYBs revealed congruency and greater degree of synteny and collinearity among closely related species. Micro-synteny analysis of genomic segments revealed high conservation of genes that are immediately flanking the surrounding tandemly organised 2R-MYBs along with instances of local duplication, reorganisations and genome fractionation. In summary, polyploidy, dysploidy, reshuffling and genome fractionation were found to cause loss or gain of 2R-MYB genes. The findings need to be supported with functional validation to understand gene structure-function relationship along the evolutionary lineage and adaptive strategies based on comparative functional genomics in plants.
Collapse
Affiliation(s)
- Mukund Lal
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Ekta Bhardwaj
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Nishu Chahar
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Shobha Yadav
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, 110007, India.
| |
Collapse
|
9
|
Shajii A, Numanagić I, Leighton AT, Greenyer H, Amarasinghe S, Berger B. A Python-based programming language for high-performance computational genomics. Nat Biotechnol 2021; 39:1062-1064. [PMID: 34282326 PMCID: PMC8542382 DOI: 10.1038/s41587-021-00985-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
- Ariya Shajii
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ibrahim Numanagić
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Computer Science, University of Victoria, Victoria, British Columbia, Canada
| | - Alexander T Leighton
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Haley Greenyer
- Department of Computer Science, University of Victoria, Victoria, British Columbia, Canada
| | - Saman Amarasinghe
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
10
|
Wu B, Feng C, Zhu C, Xu W, Yuan Y, Hu M, Yuan K, Li Y, Ren Y, Zhou Y, Jiang H, Qiu Q, Wang W, He S, Wang K. The Genomes of Two Billfishes Provide Insights into the Evolution of Endothermy in Teleosts. Mol Biol Evol 2021; 38:2413-2427. [PMID: 33533895 PMCID: PMC8136490 DOI: 10.1093/molbev/msab035] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Endothermy is a typical convergent phenomenon which has evolved independently at least eight times in vertebrates, and is of significant advantage to organisms in extending their niches. However, how vertebrates other than mammals or birds, especially teleosts, achieve endothermy has not previously been fully understood. In this study, we sequenced the genomes of two billfishes (swordfish and sailfish), members of a representative lineage of endothermic teleosts. Convergent amino acid replacements were observed in proteins related to heat production and the visual system in two endothermic teleost lineages, billfishes and tunas. The billfish-specific genetic innovations were found to be associated with heat exchange, thermoregulation, and the specialized morphology, including elongated bill, enlarged dorsal fin in sailfish and loss of the pelvic fin in swordfish.
Collapse
Affiliation(s)
- Baosheng Wu
- Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, Sanya, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Chenguang Feng
- School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China.,The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Chenglong Zhu
- School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Wenjie Xu
- School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Yuan Yuan
- School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Mingliang Hu
- School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Ke Yuan
- School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Yongxin Li
- School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Yandong Ren
- School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Yang Zhou
- Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, Sanya, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Haifeng Jiang
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Qiang Qiu
- School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Wen Wang
- School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Shunping He
- Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, Sanya, China.,School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Kun Wang
- School for Ecological and Environmental Sciences, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
11
|
Singh S, Singh A. A prescient evolutionary model for genesis, duplication and differentiation of MIR160 homologs in Brassicaceae. Mol Genet Genomics 2021; 296:985-1003. [PMID: 34052911 DOI: 10.1007/s00438-021-01797-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 05/21/2021] [Indexed: 12/18/2022]
Abstract
MicroRNA160 is a class of nitrogen-starvation responsive genes which governs establishment of root system architecture by down-regulating AUXIN RESPONSE FACTOR genes (ARF10, ARF16 and ARF17) in plants. The high copy number of MIR160 variants discovered by us from land plants, especially polyploid crop Brassicas, posed questions regarding genesis, duplication, evolution and function. Absence of studies on impact of whole genome and segmental duplication on retention and evolution of MIR160 homologs in descendent plant lineages prompted us to undertake the current study. Herein, we describe ancestry and fate of MIR160 homologs in Brassicaceae in context of polyploidy driven genome re-organization, copy number and differentiation. Paralogy amongst Brassicaceae MIR160a, MIR160b and MIR160c was inferred using phylogenetic analysis of 468 MIR160 homologs from land plants. The evolutionarily distinct MIR160a was found to represent ancestral form and progenitor of MIR160b and MIR160c. Chronology of evolutionary events resulting in origin and diversification of genomic loci containing MIR160 homologs was delineated using derivatives of comparative synteny. A prescient model for causality of segmental duplications in establishment of paralogy in Brassicaceae MIR160, with whole genome duplication accentuating the copy number increase, is being posited in which post-segmental duplication events viz. differential gene fractionation, gene duplications and inversions are shown to drive divergence of chromosome segments. While mutations caused the diversification of MIR160a, MIR160b and MIR160c, duplicated segments containing these diversified genes suffered gene rearrangements via gene loss, duplications and inversions. Yet the topology of phylogenetic and phenetic trees were found congruent suggesting similar evolutionary trajectory. Over 80% of Brassicaceae genomes and subgenomes showed a preferential retention of single copy each of MIR160a, MIR160b and MIR160c suggesting functional relevance. Thus, our study provides a blue-print for reconstructing ancestry and phylogeny of MIRNA gene families at genomics level and analyzing the impact of polyploidy on organismal complexity. Such studies are critical for understanding the molecular basis of agronomic traits and deploying appropriate candidates for crop improvement.
Collapse
Affiliation(s)
- Swati Singh
- Department of Biotechnology, TERI School of Advanced Studies, 10 Institutional Area, Vasant Kunj, New Delhi, 110070, India.,Department of Life Sciences, School of Basic Sciences and Research, Sharda University, Plot no. 32-34, Knowledge Park III, Greater Noida, Uttar Pradesh, 201310, India
| | - Anandita Singh
- Department of Biotechnology, TERI School of Advanced Studies, 10 Institutional Area, Vasant Kunj, New Delhi, 110070, India.
| |
Collapse
|
12
|
Hendelman A, Zebell S, Rodriguez-Leal D, Dukler N, Robitaille G, Wu X, Kostyun J, Tal L, Wang P, Bartlett ME, Eshed Y, Efroni I, Lippman ZB. Conserved pleiotropy of an ancient plant homeobox gene uncovered by cis-regulatory dissection. Cell 2021; 184:1724-1739.e16. [PMID: 33667348 DOI: 10.1016/j.cell.2021.02.001] [Citation(s) in RCA: 136] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/03/2021] [Accepted: 02/01/2021] [Indexed: 01/09/2023]
Abstract
Divergence of gene function is a hallmark of evolution, but assessing functional divergence over deep time is not trivial. The few alleles available for cross-species studies often fail to expose the entire functional spectrum of genes, potentially obscuring deeply conserved pleiotropic roles. Here, we explore the functional divergence of WUSCHEL HOMEOBOX9 (WOX9), suggested to have species-specific roles in embryo and inflorescence development. Using a cis-regulatory editing drive system, we generate a comprehensive allelic series in tomato, which revealed hidden pleiotropic roles for WOX9. Analysis of accessible chromatin and conserved cis-regulatory sequences identifies the regions responsible for this pleiotropic activity, the functions of which are conserved in groundcherry, a tomato relative. Mimicking these alleles in Arabidopsis, distantly related to tomato and groundcherry, reveals new inflorescence phenotypes, exposing a deeply conserved pleiotropy. We suggest that targeted cis-regulatory mutations can uncover conserved gene functions and reduce undesirable effects in crop improvement.
Collapse
Affiliation(s)
- Anat Hendelman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Sophia Zebell
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Noah Dukler
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Gina Robitaille
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Xuelin Wu
- The Salk Institute for Biological Research, San Diego, CA, USA
| | - Jamie Kostyun
- Biology Department, University of Massachusetts Amherst, Amherst, MA, USA
| | - Lior Tal
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Peipei Wang
- Institute of Plant Sciences and Genetics in Agriculture, The Robert H. Smith Faculty of Agriculture, The Hebrew University, Rehovot, Israel
| | | | - Yuval Eshed
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Idan Efroni
- Institute of Plant Sciences and Genetics in Agriculture, The Robert H. Smith Faculty of Agriculture, The Hebrew University, Rehovot, Israel.
| | - Zachary B Lippman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
13
|
Abstract
Aims:
Robust and more accurate method for identifying transcription factor binding sites
(TFBS) for gene expression.
Background:
Deep neural networks (DNNs) have shown promising growth in solving complex
machine learning problems. Conventional techniques are comfortably replaced by DNNs in
computer vision, signal processing, healthcare, and genomics. Understanding DNA sequences is
always a crucial task in healthcare and regulatory genomics. For DNA motif prediction, choosing the
right dataset with a sufficient number of input sequences is crucial in order to design an effective
model.
Objective:
Designing a new algorithm which works on different dataset while an improved
performance for TFBS prediction.
Methods:
With the help of Layerwise Relevance Propagation, the proposed algorithm identifies the
invariant features with adaptive noise patterns.
Results:
The performance is compared by calculating various metrics on standard as well as recent
methods and significant improvement is noted.
Conclusion:
By identifying the invariant and robust features in the DNA sequences, the
classification performance can be increased.
Collapse
Affiliation(s)
- Kanu Geete
- Department of Computer Science & Engineering, Maulana Azad National Institute of Technology, Bhopal, India
| | - Manish Pandey
- Department of Computer Science & Engineering, Maulana Azad National Institute of Technology, Bhopal, India
| |
Collapse
|
14
|
Sri T, Gupta B, Tyagi S, Singh A. Homeologs of Brassica SOC1, a central regulator of flowering time, are differentially regulated due to partitioning of evolutionarily conserved transcription factor binding sites in promoters. Mol Phylogenet Evol 2020; 147:106777. [PMID: 32126279 DOI: 10.1016/j.ympev.2020.106777] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 02/03/2020] [Accepted: 02/26/2020] [Indexed: 01/06/2023]
Abstract
Evolution of Brassica genome post-polyploidization reveals asymmetrical genome fractionation and copy number variation. Herein, we describe the impact of promoter divergence among SUPPRESSOR OF OVEREXPRESSION OF CONSTANS1 (SOC1) homeologs on expression and function in Brassica spp. SOC1, a regulated floral pathway integrator, is conserved as 3 redundant homeologs in diploid Brassicas. Even with high sequence identity within coding regions (92.8-100%), the spatio-temporal expression patterns of 9 SOC1 homologs in B. juncea and B. nigra indicates regulatory divergence. While LF and MF2 SOC1 homeologs are upregulated during floral transition, MF1 is barely expressed. Also, MF2 homeolog levels do not decline post-flowering, unlike LF. To investigate the underlying source of divergence, we analyzed the sequence and phylogeny of all reported (22) and isolated (21) upstream regions of Brassica SOC1. Full length upstream regions (4712-19189 bp) reveal 5 ubiquitously conserved ancestral Blocks, harboring binding sites of 18 TFs (TFBSs) characterized in Arabidopsis thaliana. The orthologs of these TFBSs are differentially conserved among Brassica SOC1 homeologs, imparting expression divergence. No crucial TFBSs are exclusively lost from LF_SOC1 promoter, while MF1_SOC1 has lost NF-Y binding site crucial for SOC1 activation by CONSTANS. MF2_SOC1 homeologs have lost important TFBSs (SEP3, AP1 and SMZ), responsible for SOC1 repression post-flowering. BjuAALF_SOC1 promoter (proximal 2 kb) shows ubiquitous reporter expression in B. juncea cv. Varuna transgenics, while BjuAAMF1_SOC1 promoter shows absence of reporter expression, validating the impact of TFBS divergence. Conservation of the original primary protein sequence is discovered in B. rapa homeologs (46) of 18 TFs. Co-regulation pattern of these TFs appeared similar for B. rapa LF and MF2 SOC1 homeologs; MF1 shows significant variation. Strong regulatory association is recorded for AP1, AP2, SEP3, FLC and CONSTANS/NF-Y, highlighting their importance in homeolog-specific SOC1 regulation. Correlation of B. juncea AP1, AP2 and FLC expression with SOC1 homeologs also complies with the TFBS differences. We thus conclude that redundant SOC1 loci contribute differentially to cumulative expression of SOC1 due to divergent selection of ancestral TFBSs.
Collapse
Affiliation(s)
- Tanu Sri
- Department of Biotechnology, TERI School of Advanced Studies, 10, Institutional Area, Vasant Kunj, New Delhi 110070, India
| | - Bharat Gupta
- Department of Biotechnology, TERI School of Advanced Studies, 10, Institutional Area, Vasant Kunj, New Delhi 110070, India
| | - Shikha Tyagi
- Department of Biotechnology, TERI School of Advanced Studies, 10, Institutional Area, Vasant Kunj, New Delhi 110070, India
| | - Anandita Singh
- Department of Biotechnology, TERI School of Advanced Studies, 10, Institutional Area, Vasant Kunj, New Delhi 110070, India.
| |
Collapse
|
15
|
Lin HN, Hsu WL. GSAlign: an efficient sequence alignment tool for intra-species genomes. BMC Genomics 2020; 21:182. [PMID: 32093618 PMCID: PMC7041101 DOI: 10.1186/s12864-020-6569-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 02/10/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Personal genomics and comparative genomics are becoming more important in clinical practice and genome research. Both fields require sequence alignment to discover sequence conservation and variation. Though many methods have been developed, some are designed for small genome comparison while some are not efficient for large genome comparison. Moreover, most existing genome comparison tools have not been evaluated the correctness of sequence alignments systematically. A wrong sequence alignment would produce false sequence variants. RESULTS In this study, we present GSAlign that handles large genome sequence alignment efficiently and identifies sequence variants from the alignment result. GSAlign is an efficient sequence alignment tool for intra-species genomes. It identifies sequence variations from the sequence alignments. We estimate performance by measuring the correctness of predicted sequence variations. The experiment results demonstrated that GSAlign is not only faster than most existing state-of-the-art methods, but also identifies sequence variants with high accuracy. CONCLUSIONS As more genome sequences become available, the demand for genome comparison is increasing. Therefore an efficient and robust algorithm is most desirable. We believe GSAlign can be a useful tool. It exhibits the abilities of ultra-fast alignment as well as high accuracy and sensitivity for detecting sequence variations.
Collapse
Affiliation(s)
- Hsin-Nan Lin
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Taipei, Taiwan.
| |
Collapse
|
16
|
Tambong JT. Taxogenomics and Systematics of the Genus Pantoea. Front Microbiol 2019; 10:2463. [PMID: 31736906 PMCID: PMC6831937 DOI: 10.3389/fmicb.2019.02463] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 10/14/2019] [Indexed: 11/28/2022] Open
Abstract
Members of the genus Pantoea are Gram-negative bacteria isolated from various environments. Taxonomic affiliation based on multilocus sequence analysis (MLSA) is used routinely for inferring accurate phylogeny and identification of bacterial species and genera. Partial sequences of five housekeeping genes (fusA, gyrB, leuS, rpoB, and pyrG) were extracted from 206 draft or complete genomes of Pantoea strains publicly available in databases and analyzed together with the representative sequences of the 25 validly published Pantoea type strains to verify and assess their phylogenetic assignations. Of a total of 159 strains assigned to species level, 11.3% of the non-type strains were incorrectly assigned within suitable Pantoea species. The highest proportion of misidentified strains was recorded in Pantoea vagans, 8 out of 15 (53.3%) inaccurate assignations at the species level. One probable reason for this incorrect classification could be the method previously used for strain identification. Forty-seven (22.8%) genome sequences were from strains identified at the genus level only (Pantoea sp.). A combination of MLSA, average nucleotide identities [ANI and MuMmer-based ANI (ANIm)], tetranucleotide usage pattern (TETRA), and genome-based DNA-DNA hybridization (gDDH) data was used to accurately assign 25 of the 47 strains to validly published Pantoea species, while 17 strains could be assigned as putative novel species within the genus Pantoea. Four genomes designed as Pantoea sp. were identified as Mixta calida. Positive and significant correlation coefficients were computed between MLSA and all the indices derived from whole-genome sequences being proposed for species delimitation. gDDH exhibited the best correlation with MLSA while TETRA was the worst. Accurate species-level identification is key to a better understanding of bacterial diversity and evolution. The MLSA scheme used here could be instrumental to determine the correct taxonomic status of new whole-genome sequenced Pantoea strains, especially non-type strains, before depositing into public databases.
Collapse
Affiliation(s)
- James T Tambong
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON, Canada
| |
Collapse
|
17
|
Leimeister CA, Dencker T, Morgenstern B. Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points. Bioinformatics 2019; 35:211-218. [PMID: 29992260 PMCID: PMC6330006 DOI: 10.1093/bioinformatics/bty592] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 07/09/2018] [Indexed: 01/30/2023] Open
Abstract
Motivation Most methods for pairwise and multiple genome alignment use fast local homology search tools to identify anchor points, i.e. high-scoring local alignments of the input sequences. Sequence segments between those anchor points are then aligned with slower, more sensitive methods. Finding suitable anchor points is therefore crucial for genome sequence comparison; speed and sensitivity of genome alignment depend on the underlying anchoring methods. Results In this article, we use filtered spaced word matches to generate anchor points for genome alignment. For a given binary pattern representing match and don't-care positions, we first search for spaced-word matches, i.e. ungapped local pairwise alignments with matching nucleotides at the match positions of the pattern and possible mismatches at the don't-care positions. Those spaced-word matches that have similarity scores above some threshold value are then extended using a standard X-drop algorithm; the resulting local alignments are used as anchor points. To evaluate this approach, we used the popular multiple-genome-alignment pipeline Mugsy and replaced the exact word matches that Mugsy uses as anchor points with our spaced-word-based anchor points. For closely related genome sequences, the two anchoring procedures lead to multiple alignments of similar quality. For distantly related genomes, however, alignments calculated with our filtered-spaced-word matches are superior to alignments produced with the original Mugsy program where exact word matches are used to find anchor points. Availability and implementation http://spacedanchor.gobics.de. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Thomas Dencker
- Department of Bioinformatics, Institute of Microbiology and Genetics
| | - Burkhard Morgenstern
- Department of Bioinformatics, Institute of Microbiology and Genetics.,Center for Computational Sciences, University of Goettingen, Goettingen, Germany
| |
Collapse
|
18
|
Functional conserved non-coding elements among tunicates and chordates. Dev Biol 2019; 448:101-110. [DOI: 10.1016/j.ydbio.2018.12.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 12/10/2018] [Accepted: 12/11/2018] [Indexed: 11/22/2022]
|
19
|
Comparative genomics reveals origin of MIR159A–MIR159B paralogy, and complexities of PTGS interaction between miR159 and target GA-MYBs in Brassicaceae. Mol Genet Genomics 2019; 294:693-714. [DOI: 10.1007/s00438-019-01540-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 02/23/2019] [Indexed: 10/27/2022]
|
20
|
Abstract
Rapidly improving sequencing technology coupled with computational developments in sequence assembly are making reference-quality genome assembly economical. Hundreds of vertebrate genome assemblies are now publicly available, and projects are being proposed to sequence thousands of additional species in the next few years. Such dense sampling of the tree of life should give an unprecedented new understanding of evolution and allow a detailed determination of the events that led to the wealth of biodiversity around us. To gain this knowledge, these new genomes must be compared through genome alignment (at the sequence level) and comparative annotation (at the gene level). However, different alignment and annotation methods have different characteristics; before starting a comparative genomics analysis, it is important to understand the nature of, and biases and limitations inherent in, the chosen methods. This review is intended to act as a technical but high-level overview of the field that should provide this understanding. We briefly survey the state of the genome alignment and comparative annotation fields and potential future directions for these fields in a new, large-scale era of comparative genomics.
Collapse
Affiliation(s)
- Joel Armstrong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA;
- 10x Genomics, Pleasanton, California 94566, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| |
Collapse
|
21
|
Akhtar MM, Micolucci L, Islam MS, Olivieri F, Procopio AD. A Practical Guide to miRNA Target Prediction. Methods Mol Biol 2019; 1970:1-13. [PMID: 30963484 DOI: 10.1007/978-1-4939-9207-2_1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
MicroRNAs (miRNAs) are small endogenous noncoding RNA molecules that posttranscriptionally regulate gene expression. Since their discovery, a huge number of miRNAs have been identified in a wide range of species. Through binding to the 3' UTR of mRNA, miRNA can block translation or stimulate degradation of the targeted mRNA, thus affecting nearly all biological processes. Prediction and identification of miRNA target genes is crucial toward understanding the biology of miRNAs. Currently, a number of sophisticated bioinformatics approaches are available to perform effective prediction of miRNA target sites. In this chapter, we present the major features that most algorithms take into account to efficiently predict miRNA target: seed match, free energy, conservation, target site accessibility, and contribution of multiple binding sites. We also give an overview of the frequently used bioinformatics tools for miRNA target prediction. Understanding the basis of these prediction methodologies may help users to better select the appropriate tools and analyze their output.
Collapse
Affiliation(s)
| | - Luigina Micolucci
- Laboratory of Experimental Pathology, Department of Clinical and Molecular Sciences, Università Politecnica delle Marche, Ancona, Italy.,Computational Pathology Unit, Department of Clinical and Molecular Sciences, Università Politecnica delle Marche, Ancona, Italy
| | - Md Soriful Islam
- Department of Gynecology and Obstetrics, Johns Hopkins University, School of Medicine, Baltimore, USA
| | - Fabiola Olivieri
- Laboratory of Experimental Pathology, Department of Clinical and Molecular Sciences, Università Politecnica delle Marche, Ancona, Italy.,Center of Clinical Pathology and Innovative Therapies, Italian National Research Center on Aging (INRCA-IRCCS), Ancona, Italy
| | - Antonio Domenico Procopio
- Laboratory of Experimental Pathology, Department of Clinical and Molecular Sciences, Università Politecnica delle Marche, Ancona, Italy.,Center of Clinical Pathology and Innovative Therapies, Italian National Research Center on Aging (INRCA-IRCCS), Ancona, Italy
| |
Collapse
|
22
|
Joshi G, Chauhan C, Das S. Microsynteny analysis to understand evolution and impact of polyploidization on MIR319 family within Brassicaceae. Dev Genes Evol 2018; 228:227-242. [PMID: 30242472 DOI: 10.1007/s00427-018-0620-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 09/14/2018] [Indexed: 10/28/2022]
Abstract
The availability of a large number of whole-genome sequences allows comparative genomic analysis to reveal and understand evolution of regulatory regions and elements. The role played by events such as whole-genome and segmental duplications followed by genome fractionation in shaping genomic landscape and in expansion of gene families is crucial toward developing insights into evolutionary trends and consequences such as sequence and functional diversification. Members of Brassicaceae are known to have experienced several rounds of whole-genome duplication (WGD) that have been termed as paleopolyploidy, mesopolyploidy, and neopolyploidy. Such repeated events led to the creation and expansion of a large number of gene families. MIR319 is reported to be one of the most ancient and conserved plant MIRNA families and plays a role in growth and development including leaf development, seedling development, and embryo patterning. We have previously reported functional diversification of members of miR319 in Brassica oleracea affecting leaf architecture; however, the evolutionary history of the MIR319 gene family across Brassicaceae remains unknown and requires investigation. We therefore identified homologous and homeologous segments of ca. 100 kb, with or without MIR319, performed comparative synteny analysis and genome fractionation studies. We detected variable rates of gene retention across members of Brassicaceae when genomic blocks of MIR319a, MIR319b, and MIR319c were compared either between themselves or against Arabidopsis thaliana genome which was taken as the base genome. The highest levels of shared genes were found between A. thaliana and Capsella rubella in both MIR319b- and MIR319c-containing genomic segments, and with the closest species of A. thaliana, A. lyrata, only in MIR319a-containing segment. Synteny analysis across 12 genomes (with 30 sub-genomes) revealed MIR319c to be the most conserved MIRNA loci (present in 27 genomes/sub-genomes) followed by MIR319a (present in 23 genomes/sub-genomes); MIR319b was found to be frequently lost (present in 20 genomes/sub-genomes) and thus is under least selection pressure for retention. Genome fractionation revealed extensive and differential loss of MIRNA homeologous loci and flanking genes from various sub-genomes of Brassica species that is in accordance with their older history of polyploidy when compared to Camelina sativa, a recent neopolyploid, where the effect of genome fractionation was least. Finally, estimation of phylogenetic relationship using precursor sequences of MIR319 reveals MIR319a and MIR319b form sister clades, with MIR319c forming a separate clade. An intra-species synteny analysis between MIR319a-, MIR319b-, and MIR319c-containing genomic segments suggests segmental duplications at the base of Brassicaceae to be responsible for the origin of MIR319a and MIR319b.
Collapse
Affiliation(s)
- Gauri Joshi
- Department of Botany, University of Delhi, Delhi, 110 007, India
| | - Chetan Chauhan
- Department of Botany, University of Delhi, Delhi, 110 007, India
- Centre for Biotechnology, Maharshi Dayanand University, Rohtak, India
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, 110 007, India.
| |
Collapse
|
23
|
Nowicki M, Boggess SL, Saxton AM, Hadziabdic D, Xiang QYJ, Molnar T, Huff ML, Staton ME, Zhao Y, Trigiano RN. Haplotyping of Cornus florida and C. kousa chloroplasts: Insights into species-level differences and patterns of plastic DNA variation in cultivars. PLoS One 2018; 13:e0205407. [PMID: 30352068 PMCID: PMC6198962 DOI: 10.1371/journal.pone.0205407] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 09/25/2018] [Indexed: 01/08/2023] Open
Abstract
Chloroplast DNA is a part of plant non-nuclear genome, and is of particular interest for lineage studies. Moreover, the non-coding regions of cpDNA display higher mutation rates than the conserved coding cpDNA, which has been employed for phylogenetic and population research. We analyzed the cpDNA of 332 gDNA samples from collections of Cornus florida and C. kousa (commercial cultivars, breeding selections, and wild kousa accessions from Asia), using the chlorotyping system developed on North America-native, wild accessions of C. florida. Our results indicated significant differences in chlorotype frequencies between the two species. Cornus florida samples were represented by all major chlorotypes previously described, whereas all C. kousa samples analyzed had only one of the chlorotype patterns shown by C. florida. The chlorotyping analytic panel was then expanded by sequencing the targeted three non-coding cpDNA regions. Results indicated a major difference in the maternally-inherited cpDNA between the two closely related Big-Bracted Cornus species. Chlorotype diversity and differences in the proportion of informative sites in the cpDNA regions of focus emphasized the importance of proper loci choice for cpDNA-based comparative studies between the closely related dogwood species. Phylogenetic analyses of the retrieved sequences for the other species of Cornus provided information on the relative utility of the cpDNA regions studied and helped delineate the groups (Big-Bracted, Cornelian Cherries, Blue/White-Fruited) within the genus. Genealogical relationships based on the cpDNA sequences and the inferred chlorotype networks indicated the need for continued analyses across further non-coding cpDNA regions to improve the phylogenetic resolution of dogwoods.
Collapse
Affiliation(s)
- Marcin Nowicki
- Department of Entomology and Plant Pathology, The University of Tennessee, Knoxville, TN, United States of America
| | - Sarah L. Boggess
- Department of Entomology and Plant Pathology, The University of Tennessee, Knoxville, TN, United States of America
| | - Arnold M. Saxton
- Department of Animal Science, The University of Tennessee, Knoxville, TN, United States of America
| | - Denita Hadziabdic
- Department of Entomology and Plant Pathology, The University of Tennessee, Knoxville, TN, United States of America
| | - Qiu-Yun Jenny Xiang
- Department of Plant and Microbial Biology, North Carolina State University Raleigh, NC, United States of America
| | - Thomas Molnar
- Department of Plant Biology Rutgers, The State University of New Jersey, New Brunswick, NJ, United States of America
| | - Matthew L. Huff
- Department of Entomology and Plant Pathology, The University of Tennessee, Knoxville, TN, United States of America
| | - Margaret E. Staton
- Department of Entomology and Plant Pathology, The University of Tennessee, Knoxville, TN, United States of America
| | - Yichen Zhao
- Guizhou Key Laboratory of Agro-Bioengineering, Guizhou University, Huaxi, Guiyang, PRC
| | - Robert N. Trigiano
- Department of Entomology and Plant Pathology, The University of Tennessee, Knoxville, TN, United States of America
| |
Collapse
|
24
|
Malmstrøm M, Britz R, Matschiner M, Tørresen OK, Hadiaty RK, Yaakob N, Tan HH, Jakobsen KS, Salzburger W, Rüber L. The Most Developmentally Truncated Fishes Show Extensive Hox Gene Loss and Miniaturized Genomes. Genome Biol Evol 2018; 10:1088-1103. [PMID: 29684203 PMCID: PMC5906920 DOI: 10.1093/gbe/evy058] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2018] [Indexed: 12/20/2022] Open
Abstract
The world’s smallest fishes belong to the genus Paedocypris. These miniature fishes are endemic to an extreme habitat: the peat swamp forests in Southeast Asia, characterized by highly acidic blackwater. This threatened habitat is home to a large array of fishes, including a number of miniaturized but also developmentally truncated species. Especially the genus Paedocypris is characterized by profound, organism-wide developmental truncation, resulting in sexually mature individuals of <8 mm in length with a larval phenotype. Here, we report on evolutionary simplification in the genomes of two species of the dwarf minnow genus Paedocypris using whole-genome sequencing. The two species feature unprecedented Hox gene loss and genome reduction in association with their massive developmental truncation. We also show how other genes involved in the development of musculature, nervous system, and skeleton have been lost in Paedocypris, mirroring its highly progenetic phenotype. Further, our analyses suggest two mechanisms responsible for the genome streamlining in Paedocypris in relation to other Cypriniformes: severe intron shortening and reduced repeat content. As the first report on the genomic sequence of a vertebrate species with organism-wide developmental truncation, the results of our work enhance our understanding of genome evolution and how genotypes are translated to phenotypes. In addition, as a naturally simplified system closely related to zebrafish, Paedocypris provides novel insights into vertebrate development.
Collapse
Affiliation(s)
- Martin Malmstrøm
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Norway.,Zoological Institute, University of Basel, Switzerland
| | - Ralf Britz
- Department of Life Sciences, Natural History Museum, London, United Kingdom
| | - Michael Matschiner
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Norway.,Zoological Institute, University of Basel, Switzerland
| | - Ole K Tørresen
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Norway
| | - Renny Kurnia Hadiaty
- Ichthyology Laboratory, Division of Zoology, Research Center for Biology, Indonesian Institute of Sciences (LIPI), Cibinong, Indonesia
| | - Norsham Yaakob
- Forest Research Institute Malaysia (FRIM), Kepong, Selangor Darul Ehsan, Malaysia
| | - Heok Hui Tan
- Lee Kong Chian Natural History Museum, National University of Singapore, Singapore
| | - Kjetill Sigurd Jakobsen
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Norway
| | - Walter Salzburger
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Norway.,Zoological Institute, University of Basel, Switzerland
| | - Lukas Rüber
- Naturhistorisches Museum Bern, Switzerland.,Aquatic Ecology and Evolution, Institute of Ecology and Evolution, University of Bern, Switzerland
| |
Collapse
|
25
|
Jones DM, Wells R, Pullen N, Trick M, Irwin JA, Morris RJ. Spatio-temporal expression dynamics differ between homologues of flowering time genes in the allopolyploid Brassica napus. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2018; 96:103-118. [PMID: 29989238 PMCID: PMC6175450 DOI: 10.1111/tpj.14020] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Revised: 05/18/2018] [Accepted: 06/19/2018] [Indexed: 05/20/2023]
Abstract
Polyploidy is a recurrent feature of eukaryotic evolution and has been linked to increases in complexity, adaptive radiation and speciation. Within angiosperms such events have occurred repeatedly in many plant lineages. Here we investigate the retention and spatio-temporal expression dynamics of duplicated genes predicted to regulate the floral transition in Brassica napus (oilseed rape, OSR). We show that flowering time genes are preferentially retained relative to other genes in the OSR genome. Using a transcriptome time series in two tissues (leaf and shoot apex) across development we show that 67% of these retained flowering time genes are expressed. Furthermore, between 64% (leaf) and 74% (shoot apex) of the retained gene homologues show diverged expression patterns relative to each other across development, suggesting neo- or subfunctionalization. A case study of homologues of the shoot meristem identity gene TFL1 reveals differences in cis-regulatory elements that could explain this divergence. Such differences in the expression dynamics of duplicated genes highlight the challenges involved in translating gene regulatory networks from diploid model systems to more complex polyploid crop species.
Collapse
Affiliation(s)
- D. Marc Jones
- Crop GeneticsJohn Innes CentreNorwich Research ParkNorwichNR4 7UHUK
- Computational and Systems BiologyJohn Innes CentreNorwich Research ParkNorwichNR4 7UHUK
| | - Rachel Wells
- Crop GeneticsJohn Innes CentreNorwich Research ParkNorwichNR4 7UHUK
| | - Nick Pullen
- Crop GeneticsJohn Innes CentreNorwich Research ParkNorwichNR4 7UHUK
| | - Martin Trick
- Computational and Systems BiologyJohn Innes CentreNorwich Research ParkNorwichNR4 7UHUK
| | - Judith A. Irwin
- Crop GeneticsJohn Innes CentreNorwich Research ParkNorwichNR4 7UHUK
| | - Richard J. Morris
- Crop GeneticsJohn Innes CentreNorwich Research ParkNorwichNR4 7UHUK
- Computational and Systems BiologyJohn Innes CentreNorwich Research ParkNorwichNR4 7UHUK
| |
Collapse
|
26
|
Jain C, Koren S, Dilthey A, Phillippy AM, Aluru S. A fast adaptive algorithm for computing whole-genome homology maps. Bioinformatics 2018; 34:i748-i756. [PMID: 30423094 PMCID: PMC6129286 DOI: 10.1093/bioinformatics/bty597] [Citation(s) in RCA: 101] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Motivation Whole-genome alignment is an important problem in genomics for comparing different species, mapping draft assemblies to reference genomes and identifying repeats. However, for large plant and animal genomes, this task remains compute and memory intensive. In addition, current practical methods lack any guarantee on the characteristics of output alignments, thus making them hard to tune for different application requirements. Results We introduce an approximate algorithm for computing local alignment boundaries between long DNA sequences. Given a minimum alignment length and an identity threshold, our algorithm computes the desired alignment boundaries and identity estimates using kmer-based statistics, and maintains sufficient probabilistic guarantees on the output sensitivity. Further, to prioritize higher scoring alignment intervals, we develop a plane-sweep based filtering technique which is theoretically optimal and practically efficient. Implementation of these ideas resulted in a fast and accurate assembly-to-genome and genome-to-genome mapper. As a result, we were able to map an error-corrected whole-genome NA12878 human assembly to the hg38 human reference genome in about 1 min total execution time and <4 GB memory using eight CPU threads, achieving significant improvement in memory-usage over competing methods. Recall accuracy of computed alignment boundaries was consistently found to be >97% on multiple datasets. Finally, we performed a sensitive self-alignment of the human genome to compute all duplications of length ≥1 Kbp and ≥90% identity. The reported output achieves good recall and covers twice the number of bases than the current UCSC browser's segmental duplication annotation. Availability and implementation https://github.com/marbl/MashMap.
Collapse
Affiliation(s)
- Chirag Jain
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alexander Dilthey
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Institute of Medical Microbiology, University Hospital of Düsseldorf, Düsseldorf, Germany
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Srinivas Aluru
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
27
|
Barton-Owen TB, Ferrier DEK, Somorjai IML. Pax3/7 duplicated and diverged independently in amphioxus, the basal chordate lineage. Sci Rep 2018; 8:9414. [PMID: 29925900 PMCID: PMC6010424 DOI: 10.1038/s41598-018-27700-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 06/06/2018] [Indexed: 01/06/2023] Open
Abstract
The Pax3/7 transcription factor family is integral to developmental gene networks contributing to important innovations in vertebrate evolution, including the neural crest. The basal chordate lineage of amphioxus is ideally placed to understand the dynamics of the gene regulatory network evolution that produced these novelties. We report here the discovery that the cephalochordate lineage possesses two Pax3/7 genes, Pax3/7a and Pax3/7b. The tandem duplication is ancestral to all extant amphioxus, occurring in both Asymmetron and Branchiostoma, but originated after the split from the lineage leading to vertebrates. The two paralogues are differentially expressed during embryonic development, particularly in neural and somitic tissues, suggesting distinct regulation. Our results have implications for the study of amphioxus regeneration, neural plate and crest evolution, and differential tandem paralogue evolution.
Collapse
Affiliation(s)
- Thomas B Barton-Owen
- University of St Andrews, Gatty Marine Laboratory, Scottish Oceans Institute, East Sands, St Andrews, Fife, KY16 8LB, UK.,University of St Andrews, Biomedical Sciences Research Complex, North Haugh, St Andrews, Fife, KY16 9ST, UK
| | - David E K Ferrier
- University of St Andrews, Gatty Marine Laboratory, Scottish Oceans Institute, East Sands, St Andrews, Fife, KY16 8LB, UK
| | - Ildikó M L Somorjai
- University of St Andrews, Gatty Marine Laboratory, Scottish Oceans Institute, East Sands, St Andrews, Fife, KY16 8LB, UK. .,University of St Andrews, Biomedical Sciences Research Complex, North Haugh, St Andrews, Fife, KY16 9ST, UK.
| |
Collapse
|
28
|
Lee NK, Azizan FL, Wong YS, Omar N. DeepFinder: An integration of feature-based and deep learning approach for DNA motif discovery. BIOTECHNOL BIOTEC EQ 2018. [DOI: 10.1080/13102818.2018.1438209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Affiliation(s)
- Nung Kion Lee
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Farah Liyana Azizan
- Centre For Pre-University Studies, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Yu Shiong Wong
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Norshafarina Omar
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| |
Collapse
|
29
|
Singh S, Das S, Geeta R. A segmental duplication in the common ancestor of Brassicaceae is responsible for the origin of the paralogs KCS6-KCS5, which are not shared with other angiosperms. Mol Phylogenet Evol 2018; 126:331-345. [PMID: 29698723 DOI: 10.1016/j.ympev.2018.04.018] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Revised: 04/11/2018] [Accepted: 04/11/2018] [Indexed: 12/14/2022]
Abstract
Novel morphological structures allowed adaptation to dry conditions in early land plants. The cuticle, one such novelty, plays diverse roles in tolerance to abiotic and biotic stresses and plant development. Cuticular waxes represent a major constituent of the cuticle and are comprised of an assortment of chemicals that include, among others, very long chain fatty acids (VLCFAs). Members of the β-ketoacyl coenzyme A synthases (KCS) gene family code for enzymes that are essential for fatty acid biosynthesis. The gene KCS6 (CUT1) is known to be a key player in the production of VLCFA precursors essential for the synthesis of cuticular waxes in the model plant Arabidopsis thaliana (Brassicaceae). Despite its functional importance, relatively little is known about the evolutionary history of KCS6 or its paralog KCS5 in Brassicaceae or beyond. This lacuna becomes important when we extrapolate understanding of mechanisms gained from the model plant to its containing clades Brassicaceae, flowering plants, or beyond. The Brassicaceae, with several sequenced genomes and a known history of paleoploidy, mesopolyploidy and neopolyploidy, offer a system in which to study the evolution and diversification of the KCS6-KCS5 paralogy. Our phylogenetic analyses across green plants, combined with comparative genomic, microsynteny and evolutionary rates analyses across nine genomes of Brassicaceae, reveal that (1) the KCS6-KCS5 paralogy arose as the result of a large segmental duplication in the ancestral Brassicaceae, (2) the KCS6-KCS5 lineage is represented by a single copy in other flowering plant lineages, (3) the duplicated segments undergo different degrees of retention and loss, and (4) most of the genes in the KCS6 and KCS5 gene blocks (including KCS6 and KCS5 themselves) are under purifying selection. The last also true for most members of the KCS gene family in Brassicaceae, except for KCS8, KCS9 and KCS17, which are under positive selection and may be undergoing functional evolution, meriting further investigation. Overall, our results clearly establish that the ancestral KCS6/5 gene duplicated in the Brassicaceae lineage. It is possible that any specialized functions of KCS5 found in Brassicaceae are either part of a set of KCS6/5 gene functions in the rest of the flowering plants, or unique to Brassicaceae.
Collapse
Affiliation(s)
- Swati Singh
- Department of Botany, University of Delhi, Delhi 110007, India
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi 110007, India
| | - R Geeta
- Department of Botany, University of Delhi, Delhi 110007, India.
| |
Collapse
|
30
|
MapToGenome: A Comparative Genomic Tool that Aligns Transcript Maps to Sequenced Genomes. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Efforts to generate whole genome assemblies and dense genetic maps have provided a wealth of gene positional information for several vertebrate species. Comparing the relative location of orthologous genes among these genomes provides perspective on genome evolution and can aid in translating genetic information between distantly related organisms. However, large-scale comparisons between genetic maps and genome assemblies can prove challenging because genetic markers are commonly derived from transcribed sequences that are incompletely and variably annotated. We developed the program MapToGenome as a tool for comparing transcript maps and genome assemblies. MapToGenome processes sequence alignments between mapped transcripts and whole genome sequence while accounting for the presence of intronic sequences, and assigns orthology based on user-defined parameters. To illustrate the utility of this program, we used MapToGenome to process alignments between vertebrate genetic maps and genome assemblies 1) self/self alignments for maps and assemblies of the rat and zebrafish genome; 2) alignments between vertebrate transcript maps (rat, salamander, zebrafish, and medaka) and the chicken genome; and 3) alignments of the medaka and zebrafish maps to the pufferfish ( Tetraodon nigroviridis) genome. Our results show that map-genome alignments can be improved by combining alignments across presumptive intron breaks and ignoring alignments for simple sequence length polymorphism (SSLP) marker sequences. Comparisons between vertebrate maps and genomes reveal broad patterns of conservation among vertebrate genomes and the differential effects of genome rearrangement over time and across lineages.
Collapse
|
31
|
Golestan Hashemi FS, Razi Ismail M, Rafii Yusop M, Golestan Hashemi MS, Nadimi Shahraki MH, Rastegari H, Miah G, Aslani F. Intelligent mining of large-scale bio-data: Bioinformatics applications. BIOTECHNOL BIOTEC EQ 2017. [DOI: 10.1080/13102818.2017.1364977] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Affiliation(s)
- Farahnaz Sadat Golestan Hashemi
- Plant Genetics, AgroBioChem Department, Gembloux Agro-Bio Tech, University of Liege, Liege, Belgium
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Razi Ismail
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Rafii Yusop
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mahboobe Sadat Golestan Hashemi
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Mohammad Hossein Nadimi Shahraki
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Hamid Rastegari
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
| | - Gous Miah
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Farzad Aslani
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| |
Collapse
|
32
|
The genome of the Gulf pipefish enables understanding of evolutionary innovations. Genome Biol 2016; 17:258. [PMID: 27993155 PMCID: PMC5168715 DOI: 10.1186/s13059-016-1126-6] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 12/05/2016] [Indexed: 11/10/2022] Open
Abstract
Background Evolutionary origins of derived morphologies ultimately stem from changes in protein structure, gene regulation, and gene content. A well-assembled, annotated reference genome is a central resource for pursuing these molecular phenomena underlying phenotypic evolution. We explored the genome of the Gulf pipefish (Syngnathus scovelli), which belongs to family Syngnathidae (pipefishes, seahorses, and seadragons). These fishes have dramatically derived bodies and a remarkable novelty among vertebrates, the male brood pouch. Results We produce a reference genome, condensed into chromosomes, for the Gulf pipefish. Gene losses and other changes have occurred in pipefish hox and dlx clusters and in the tbx and pitx gene families, candidate mechanisms for the evolution of syngnathid traits, including an elongated axis and the loss of ribs, pelvic fins, and teeth. We measure gene expression changes in pregnant versus non-pregnant brood pouch tissue and characterize the genomic organization of duplicated metalloprotease genes (patristacins) recruited into the function of this novel structure. Phylogenetic inference using ultraconserved sequences provides an alternative hypothesis for the relationship between orders Syngnathiformes and Scombriformes. Comparisons of chromosome structure among percomorphs show that chromosome number in a pipefish ancestor became reduced via chromosomal fusions. Conclusions The collected findings from this first syngnathid reference genome open a window into the genomic underpinnings of highly derived morphologies, demonstrating that de novo production of high quality and useful reference genomes is within reach of even small research groups. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-1126-6) contains supplementary material, which is available to authorized users.
Collapse
|
33
|
Abstract
There are millions of sequences deposited in genomic databases, and it is an important task to categorize them according to their structural and functional roles. Sequence comparison is a prerequisite for proper categorization of both DNA and protein sequences, and helps in assigning a putative or hypothetical structure and function to a given sequence. There are various methods available for comparing sequences, alignment being first and foremost for sequences with a small number of base pairs as well as for large-scale genome comparison. Various tools are available for performing pairwise large sequence comparison. The best known tools either perform global alignment or generate local alignments between the two sequences. In this chapter we first provide basic information regarding sequence comparison. This is followed by the description of the PAM and BLOSUM matrices that form the basis of sequence comparison. We also give a practical overview of currently available methods such as BLAST and FASTA, followed by a description and overview of tools available for genome comparison including LAGAN, MumMER, BLASTZ, and AVID.
Collapse
|
34
|
Pietzenuk B, Markus C, Gaubert H, Bagwan N, Merotto A, Bucher E, Pecinka A. Recurrent evolution of heat-responsiveness in Brassicaceae COPIA elements. Genome Biol 2016; 17:209. [PMID: 27729060 PMCID: PMC5059998 DOI: 10.1186/s13059-016-1072-3] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 09/23/2016] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND The mobilization of transposable elements (TEs) is suppressed by host genome defense mechanisms. Recent studies showed that the cis-regulatory region of Arabidopsis thaliana COPIA78/ONSEN retrotransposons contains heat-responsive elements (HREs), which cause their activation during heat stress. However, it remains unknown whether this is a common and potentially conserved trait and how it has evolved. RESULTS We show that ONSEN, COPIA37, TERESTRA, and ROMANIAT5 are the major families of heat-responsive TEs in A. lyrata and A. thaliana. Heat-responsiveness of COPIA families is correlated with the presence of putative high affinity heat shock factor binding HREs within their long terminal repeats in seven Brassicaceae species. The strong HRE of ONSEN is conserved over millions of years and has evolved by duplication of a proto-HRE sequence, which was already present early in the evolution of the Brassicaceae. However, HREs of most families are species-specific, and in Boechera stricta, the ONSEN HRE accumulated mutations and lost heat-responsiveness. CONCLUSIONS Gain of HREs does not always provide an ultimate selective advantage for TEs, but may increase the probability of their long-term survival during the co-evolution of hosts and genomic parasites.
Collapse
Affiliation(s)
- Björn Pietzenuk
- Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, 50829, Germany
- Present address: Department of Plant Physiology, Ruhr-University Bochum, Bochum, Germany
| | - Catarine Markus
- Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, 50829, Germany
- Department of Crop Science, Federal University of Rio Grande do Sul, Porto Alegre, RS, 91540000, Brazil
| | - Hervé Gaubert
- Department of Plant Biology, University of Geneva, Sciences III, 30 Quai Ernest-Ansermet, 1211, Geneva 4, Switzerland
- Present address: The Sainsbury Laboratory, University of Cambridge, Cambridge, UK
| | - Navratan Bagwan
- Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, 50829, Germany
- Present address: Cardiovascular proteomics, Centro Nacional de Investigaciones Cardiovasculares, Madrid, 28029, Spain
| | - Aldo Merotto
- Department of Crop Science, Federal University of Rio Grande do Sul, Porto Alegre, RS, 91540000, Brazil
| | - Etienne Bucher
- UMR1345 IRHS, Université d'Angers, INRA, Université Bretagne Loire, SFR4207 QUASAV, 49045, Angers, France
| | - Ales Pecinka
- Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, Cologne, 50829, Germany.
| |
Collapse
|
35
|
Kuzniewska B, Nader K, Dabrowski M, Kaczmarek L, Kalita K. Adult Deletion of SRF Increases Epileptogenesis and Decreases Activity-Induced Gene Expression. Mol Neurobiol 2016; 53:1478-1493. [PMID: 25636686 PMCID: PMC4789231 DOI: 10.1007/s12035-014-9089-7] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 12/29/2014] [Indexed: 11/27/2022]
Abstract
Although the transcription factor serum response factor (SRF) has been suggested to play a role in activity-dependent gene expression and mediate plasticity-associated structural changes in the hippocampus, no unequivocal evidence has been provided for its role in brain pathology, such as epilepsy. A genome-wide program of activity-induced genes that are regulated by SRF also remains unknown. In the present study, we show that the inducible and conditional deletion of SRF in the adult mouse hippocampus increases the epileptic phenotype in the kainic acid model of epilepsy, reflected by more severe and frequent seizures. Moreover, we observe a robust decrease in activity-induced gene transcription in SRF knockout mice. We characterize the genetic program controlled by SRF in neurons and using functional annotation, we find that SRF target genes are associated with synaptic plasticity and epilepsy. Several of these SRF targets function as regulators of inhibitory or excitatory balance and the structural plasticity of neurons. Interestingly, mutations in those SRF targets have found to be associated with such human neuropsychiatric disorders, as autism and intellectual disability. We also identify novel direct SRF targets in hippocampus: Npas4, Gadd45g, and Zfp36. Altogether, our data indicate that proteins that are highly upregulated by neuronal stimulation, identified in the present study as SRF targets, may function as endogenous protectors against overactivation. Thus, the lack of these effector proteins in SRF knockout animals may lead to uncontrolled excitation and eventually epilepsy.
Collapse
Affiliation(s)
- Bozena Kuzniewska
- Laboratory of Neurobiology, Nencki Institute, 3 Pasteur Street, Warsaw, Poland
| | - Karolina Nader
- Laboratory of Neurobiology, Nencki Institute, 3 Pasteur Street, Warsaw, Poland
| | - Michal Dabrowski
- Laboratory of Bioinformatics, Neurobiology Center, Nencki Institute, 3 Pasteur Street, Warsaw, Poland
| | - Leszek Kaczmarek
- Laboratory of Neurobiology, Nencki Institute, 3 Pasteur Street, Warsaw, Poland
| | - Katarzyna Kalita
- Laboratory of Neurobiology, Nencki Institute, 3 Pasteur Street, Warsaw, Poland.
| |
Collapse
|
36
|
Stojanov D, Madevska Bogdanova A, Orzechowski TM. TMO: time and memory optimized algorithm applicable for more accurate alignment of trinucleotide repeat disorders associated genes. BIOTECHNOL BIOTEC EQ 2016. [DOI: 10.1080/13102818.2015.1114428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
|
37
|
Yu N, Guo X, Gu F, Pan Y. Signalign: An Ontology of DNA as Signal for Comparative Gene Structure Prediction Using Information-Coding-and-Processing Techniques. IEEE Trans Nanobioscience 2016; 15:119-30. [DOI: 10.1109/tnb.2016.2537831] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
38
|
La Fortezza M, Schenk M, Cosolo A, Kolybaba A, Grass I, Classen AK. JAK/STAT signalling mediates cell survival in response to tissue stress. Development 2016; 143:2907-19. [DOI: 10.1242/dev.132340] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Accepted: 06/23/2016] [Indexed: 12/31/2022]
Abstract
Tissue homeostasis relies on the ability of tissues to respond to stress. Tissue regeneration and tumour models in Drosophila have shown that JNK is a prominent stress-response pathway promoting injury-induced apoptosis and compensatory proliferation. A central question remaining unanswered is how both responses are balanced by activation of a single pathway. JAK/STAT signalling, a potential JNK target, is implicated in promoting compensatory proliferation. While we observe JAK/STAT activation in imaginal discs upon damage, our data demonstrates that JAK/STAT and its downstream effector Zfh2 promote survival of JNK-signalling cells instead. The JNK component fos and the pro-apoptotic gene hid are regulated in a JAK/STAT-dependent manner. This molecular pathway restrains JNK-induced apoptosis and spatial propagation of JNK-signalling, thereby limiting the extent of tissue damage, as well as facilitating systemic and proliferative responses to injury. We find that the pro-survival function of JAK/STAT also drives tumour growth under conditions of chronic stress. Our study defines JAK/STAT function in tissue stress and illustrates how crosstalk between conserved signalling pathways establishes an intricate equilibrium between proliferation, apoptosis and survival to restore tissue homeostasis.
Collapse
Affiliation(s)
- Marco La Fortezza
- Ludwig-Maximilians-University Munich, Faculty of Biology, Grosshaderner Strasse 2-4, 82152 Planegg-Martinsried, Germany
| | - Madlin Schenk
- Ludwig-Maximilians-University Munich, Faculty of Biology, Grosshaderner Strasse 2-4, 82152 Planegg-Martinsried, Germany
| | - Andrea Cosolo
- Ludwig-Maximilians-University Munich, Faculty of Biology, Grosshaderner Strasse 2-4, 82152 Planegg-Martinsried, Germany
| | - Addie Kolybaba
- Ludwig-Maximilians-University Munich, Faculty of Biology, Grosshaderner Strasse 2-4, 82152 Planegg-Martinsried, Germany
| | - Isabelle Grass
- Ludwig-Maximilians-University Munich, Faculty of Biology, Grosshaderner Strasse 2-4, 82152 Planegg-Martinsried, Germany
| | - Anne-Kathrin Classen
- Ludwig-Maximilians-University Munich, Faculty of Biology, Grosshaderner Strasse 2-4, 82152 Planegg-Martinsried, Germany
| |
Collapse
|
39
|
Tsujimura T, Masuda R, Ashino R, Kawamura S. Spatially differentiated expression of quadruplicated green-sensitive RH2 opsin genes in zebrafish is determined by proximal regulatory regions and gene order to the locus control region. BMC Genet 2015; 16:130. [PMID: 26537431 PMCID: PMC4634787 DOI: 10.1186/s12863-015-0288-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 10/27/2015] [Indexed: 11/22/2022] Open
Abstract
Background Fish are remarkably diverse in repertoires of visual opsins by gene duplications. Differentiation of their spatiotemporal expression patterns and absorption spectra enables fine-tuning of feature detection in spectrally distinct regions of the visual field during ontogeny. Zebrafish have quadruplicated green-sensitive (RH2) opsin genes in tandem (RH2-1, −2, −3, −4), which are expressed in the short member of the double cones (SDC). The shortest wavelength RH2 subtype (RH2-1) is expressed in the central to dorsal area of the adult retina. The second shortest wave subtype (RH2-2) is expressed overlapping with RH2-1 but extending outside of it. The second longest wave subtype (RH2-3) is expressed surrounding the RH2–2 area, and the longest wave subtype (RH2-4) is expressed outside of the RH2-3 area broadly occupying the ventral area. Expression of the four RH2 genes in SDC requires a single enhancer (RH2-LCR), but the mechanism of their spatial differentiation remains elusive. Results Functional comparison of the RH2-LCR with its counterpart in medaka revealed that the regulatory role of the RH2-LCR in SDC-specific expression is evolutionarily conserved. By combining the RH2-LCR and the proximal upstream region of each RH2 gene with fluorescent protein reporters, we show that the RH2-LCR and the RH2-3 proximal regulatory region confer no spatial selectivity of expression in the retina. But those of RH2-1, −2 and −4 are capable of inducing spatial differentiation of expression. Furthermore, by analyzing transgenic fish with a series of arrays consisting of the RH2-LCR and multiple upstream regions of the RH2 genes in different orders, we show that a gene expression pattern related to an upstream region is greatly influenced by another flanking upstream region in a relative position-dependent manner. Conclusions The zebrafish RH2 genes except RH2-3 acquired differential cis-elements in the proximal upstream regions to specify the differential expression patterns. The input from these proximal elements collectively dictates the actual gene expression pattern of the locus, context-dependently. Importantly, competition for the RH2-LCR activity among the replicates is critical in this collective regulation, facilitating differentiation of expression among them. This combination of specificity and generality enables seemingly complicated spatial differentiation of duplicated opsin genes characteristic in fish. Electronic supplementary material The online version of this article (doi:10.1186/s12863-015-0288-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Taro Tsujimura
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, 277-8562, Chiba, Japan. .,Department of Advanced Nephrology and Regenerative Medicine, Division of Tissue Engineering, the University of Tokyo Hospital, Hongo 7-3-1, Bunkyo-ku, 113-8655, Tokyo, Japan.
| | - Ryoko Masuda
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, 277-8562, Chiba, Japan.
| | - Ryuichi Ashino
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, 277-8562, Chiba, Japan.
| | - Shoji Kawamura
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, the University of Tokyo, Kashiwanoha 5-1-5, Kashiwa, 277-8562, Chiba, Japan.
| |
Collapse
|
40
|
Hoppe E, Pauly M, Robbins M, Gray M, Kujirakwinja D, Nishuli R, Boji Mungu-Akonkwa DD, Leendertz FH, Ehlers B. Phylogenomic evidence for recombination of adenoviruses in wild gorillas. J Gen Virol 2015. [DOI: 10.1099/jgv.0.000250] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Affiliation(s)
- Eileen Hoppe
- Division 12 ‘Measles, Mumps, Rubella and Viruses affecting immunocompromised patients’, Robert Koch Institute, 13353 Berlin, Germany
| | - Maude Pauly
- Division 12 ‘Measles, Mumps, Rubella and Viruses affecting immunocompromised patients’, Robert Koch Institute, 13353 Berlin, Germany
- P3 ‘Epidemiology of highly pathogenic microorganisms’, Robert Koch Institute, 13353 Berlin, Germany
| | - Martha Robbins
- Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Maryke Gray
- Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Deo Kujirakwinja
- Wildlife Conservation Society, Grauer's Gorilla Project, Democratic Republic of the Congo
| | - Radar Nishuli
- Institut Congolais pour la Conservation de la Nature, Democratic Republic of the Congo
| | | | - Fabian H. Leendertz
- P3 ‘Epidemiology of highly pathogenic microorganisms’, Robert Koch Institute, 13353 Berlin, Germany
| | - Bernhard Ehlers
- Division 12 ‘Measles, Mumps, Rubella and Viruses affecting immunocompromised patients’, Robert Koch Institute, 13353 Berlin, Germany
| |
Collapse
|
41
|
Negre B, Simpson P. The achaete-scute complex in Diptera: patterns of noncoding sequence evolution. J Evol Biol 2015; 28:1770-81. [PMID: 26134680 PMCID: PMC4832353 DOI: 10.1111/jeb.12687] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Revised: 06/26/2015] [Accepted: 06/29/2015] [Indexed: 11/29/2022]
Abstract
The achaete‐scute complex (AS‐C) has been a useful paradigm for the study of pattern formation and its evolution. achaete‐scute genes have duplicated and evolved distinct expression patterns during the evolution of cyclorraphous Diptera. Are the expression patterns in different species driven by conserved regulatory elements? If so, when did such regulatory elements arise? Here, we have sequenced most of the AS‐C of the fly Calliphora vicina (including the genes achaete, scute and lethal of scute) to compare noncoding sequences with known cis‐regulatory sequences in Drosophila. The organization of the complex is conserved with respect to Drosophila species. There are numerous small stretches of conserved noncoding sequence that, in spite of high sequence turnover, display binding sites for known transcription factors. Synteny of the blocks of conserved noncoding sequences is maintained suggesting not only conservation of the position of regulatory elements but also an origin prior to the divergence between these two species. We propose that some of these enhancers originated by duplication with their target genes.
Collapse
Affiliation(s)
- B Negre
- Department of Zoology, University of Cambridge, Cambridge, UK
| | - P Simpson
- Department of Zoology, University of Cambridge, Cambridge, UK
| |
Collapse
|
42
|
Sri T, Mayee P, Singh A. Sequence and expression variation in SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1): homeolog evolution in Indian Brassicas. Dev Genes Evol 2015; 225:287-303. [PMID: 26276216 DOI: 10.1007/s00427-015-0513-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2015] [Accepted: 08/03/2015] [Indexed: 10/23/2022]
Abstract
Whole genome sequence analyses allow unravelling such evolutionary consequences of meso-triplication event in Brassicaceae (∼14-20 million years ago (MYA)) as differential gene fractionation and diversification in homeologous sub-genomes. This study presents a simple gene-centric approach involving microsynteny and natural genetic variation analysis for understanding SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1) homeolog evolution in Brassica. Analysis of microsynteny in Brassica rapa homeologous regions containing SOC1 revealed differential gene fractionation correlating to reported fractionation status of sub-genomes of origin, viz. least fractionated (LF), moderately fractionated 1 (MF1) and most fractionated (MF2), respectively. Screening 18 cultivars of 6 Brassica species led to the identification of 8 genomic and 27 transcript variants of SOC1, including splice-forms. Co-occurrence of both interrupted and intronless SOC1 genes was detected in few Brassica species. In silico analysis characterised Brassica SOC1 as MADS intervening, K-box, C-terminal (MIKC(C)) transcription factor, with highly conserved MADS and I domains relative to K-box and C-terminal domain. Phylogenetic analyses and multiple sequence alignments depicting shared pattern of silent/non-silent mutations assigned Brassica SOC1 homologs into groups based on shared diploid base genome. In addition, a sub-genome structure in uncharacterised Brassica genomes was inferred. Expression analysis of putative MF2 and LF (Brassica diploid base genome A (AA)) sub-genome-specific SOC1 homeologs of Brassica juncea revealed near identical expression pattern. However, MF2-specific homeolog exhibited significantly higher expression implying regulatory diversification. In conclusion, evidence for polyploidy-induced sequence and regulatory evolution in Brassica SOC1 is being presented wherein differential homeolog expression is implied in functional diversification.
Collapse
Affiliation(s)
- Tanu Sri
- Department of Biotechnology, TERI University, 10 Institutional Area, Vasant Kunj, New Delhi, 110070, India
| | - Pratiksha Mayee
- Department of Biotechnology, TERI University, 10 Institutional Area, Vasant Kunj, New Delhi, 110070, India
- Department of Research, Ankur Seeds Pvt. Ltd., Nagpur, 440018, India
| | - Anandita Singh
- Department of Biotechnology, TERI University, 10 Institutional Area, Vasant Kunj, New Delhi, 110070, India.
| |
Collapse
|
43
|
Vassalli QA, Anishchenko E, Caputi L, Sordino P, D'Aniello S, Locascio A. Regulatory elements retained during chordate evolution: Coming across tunicates. Genesis 2014; 53:66-81. [DOI: 10.1002/dvg.22838] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2014] [Revised: 11/06/2014] [Accepted: 11/11/2014] [Indexed: 12/22/2022]
Affiliation(s)
- Quirino Attilio Vassalli
- Cellular and Developmental Biology Laboratory; Stazione Zoologica Anton Dohrn; Villa Comunale Naples Italy
| | - Evgeniya Anishchenko
- Cellular and Developmental Biology Laboratory; Stazione Zoologica Anton Dohrn; Villa Comunale Naples Italy
| | - Luigi Caputi
- Cellular and Developmental Biology Laboratory; Stazione Zoologica Anton Dohrn; Villa Comunale Naples Italy
| | - Paolo Sordino
- Cellular and Developmental Biology Laboratory; Stazione Zoologica Anton Dohrn; Villa Comunale Naples Italy
- CNR ISAFOM, Institute for Agricultural and Forest Systems in the Mediterranean, Unitá organizzativa di supporto; Catania Italy
| | - Salvatore D'Aniello
- Cellular and Developmental Biology Laboratory; Stazione Zoologica Anton Dohrn; Villa Comunale Naples Italy
| | - Annamaria Locascio
- Cellular and Developmental Biology Laboratory; Stazione Zoologica Anton Dohrn; Villa Comunale Naples Italy
| |
Collapse
|
44
|
Mulley JF, Holland PW. Genomic organisation of the seven ParaHox genes of coelacanths. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B: MOLECULAR AND DEVELOPMENTAL EVOLUTION 2014; 322:352-8. [PMID: 23775937 PMCID: PMC4471637 DOI: 10.1002/jez.b.22513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Revised: 04/25/2013] [Accepted: 04/25/2013] [Indexed: 11/30/2022]
Abstract
Human and mouse genomes contain six ParaHox genes implicated in gut and neural patterning. In coelacanths and cartilaginous fish, an additional ParaHox gene exists—Pdx2—that dates back to the genome duplications in early vertebrate evolution. Here we examine the genomic arrangement and flanking genes of all ParaHox genes in coelacanths, to determine the full complement of these genes. We find that coelacanths have seven ParaHox genes in total, in four chromosomal locations, revealing that five gene losses occurred soon after vertebrate genome duplication. Comparison of intergenic sequences reveals that some Pdx1 regulatory regions associated with development of pancreatic islets are older than tetrapods, that Pdx1 and Pdx2 share few if any conserved non-coding elements, and that there is very high sequence conservation between coelacanth species.
Collapse
Affiliation(s)
- John F. Mulley
- School of Biological SciencesBangor UniversityBangorGwynedd, United Kingdom
| | | |
Collapse
|
45
|
Khan MI, Kamal MS. Performance evaluation of Warshall algorithm and dynamic programming for Markov chain in local sequence alignment. Interdiscip Sci 2014; 7:78-81. [PMID: 25118652 DOI: 10.1007/s12539-013-0042-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Revised: 02/07/2014] [Accepted: 02/23/2014] [Indexed: 11/28/2022]
Abstract
Markov Chain is very effective in prediction basically in long data set. In DNA sequencing it is always very important to find the existence of certain nucleotides based on the previous history of the data set. We imposed the Chapman Kolmogorov equation to accomplish the task of Markov Chain. Chapman Kolmogorov equation is the key to help the address the proper places of the DNA chain and this is very powerful tools in mathematics as well as in any other prediction based research. It incorporates the score of DNA sequences calculated by various techniques. Our research utilize the fundamentals of Warshall Algorithm (WA) and Dynamic Programming (DP) to measures the score of DNA segments. The outcomes of the experiment are that Warshall Algorithm is good for small DNA sequences on the other hand Dynamic Programming are good for long DNA sequences. On the top of above findings, it is very important to measure the risk factors of local sequencing during the matching of local sequence alignments whatever the length.
Collapse
Affiliation(s)
- Mohammad Ibrahim Khan
- Department of Computer Science & Engineering., Chittagong University of Engineering and Technology, Chittagong, 4349, Bangladesh
| | | |
Collapse
|
46
|
Poliakov A, Foong J, Brudno M, Dubchak I. GenomeVISTA--an integrated software package for whole-genome alignment and visualization. ACTA ACUST UNITED AC 2014; 30:2654-5. [PMID: 24860159 DOI: 10.1093/bioinformatics/btu355] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
UNLABELLED With the ubiquitous generation of complete genome assemblies for a variety of species, efficient tools for whole-genome alignment along with user-friendly visualization are critically important. Our VISTA family of tools for comparative genomics, based on algorithms for pairwise and multiple alignments of genomic sequences and whole-genome assemblies, has become one of the standard techniques for comparative analysis. Most of the VISTA programs have been implemented as Web-accessible servers and are extensively used by the biomedical community. In this manuscript, we introduce GenomeVISTA: a novel implementation that incorporates most features of the VISTA family--fast and accurate alignment, visualization capabilities, GUI and analytical tools within a stand-alone software package. GenomeVISTA thus provides flexibility and security for users who need to conduct whole-genome comparisons on their own computers. AVAILABILITY AND IMPLEMENTATION Implemented in Perl, C/C++ and Java, the source code is freely available for download at the VISTA Web site: http://genome.lbl.gov/vista/.
Collapse
Affiliation(s)
- Alexandre Poliakov
- US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA
| | - Justin Foong
- US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA
| | - Michael Brudno
- US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA
| | - Inna Dubchak
- US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA, Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON M5G 1X8 Canada, Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Canada and Genomics Division, LBNL, Berkeley, CA 94720, USA
| |
Collapse
|
47
|
Naturally occurring deletions of hunchback binding sites in the even-skipped stripe 3+7 enhancer. PLoS One 2014; 9:e91924. [PMID: 24786295 PMCID: PMC4006794 DOI: 10.1371/journal.pone.0091924] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 02/18/2014] [Indexed: 11/23/2022] Open
Abstract
Changes in regulatory DNA contribute to phenotypic differences within and between taxa. Comparative studies show that many transcription factor binding sites (TFBS) are conserved between species whereas functional studies reveal that some mutations segregating within species alter TFBS function. Consistently, in this analysis of 13 regulatory elements in Drosophila melanogaster populations, single base and insertion/deletion polymorphism are rare in characterized regulatory elements. Experimentally defined TFBS are nearly devoid of segregating mutations and, as has been shown before, are quite conserved. For instance 8 of 11 Hunchback binding sites in the stripe 3+7 enhancer of even-skipped are conserved between D. melanogaster and Drosophila virilis. Oddly, we found a 72 bp deletion that removes one of these binding sites (Hb8), segregating within D. melanogaster. Furthermore, a 45 bp deletion polymorphism in the spacer between the stripe 3+7 and stripe 2 enhancers, removes another predicted Hunchback site. These two deletions are separated by ∼250 bp, sit on distinct haplotypes, and segregate at appreciable frequency. The Hb8Δ is at 5 to 35% frequency in the new world, but also shows cosmopolitan distribution. There is depletion of sequence variation on the Hb8Δ-carrying haplotype. Quantitative genetic tests indicate that Hb8Δ affects developmental time, but not viability of offspring. The Eve expression pattern differs between inbred lines, but the stripe 3 and 7 boundaries seem unaffected by Hb8Δ. The data reveal segregating variation in regulatory elements, which may reflect evolutionary turnover of characterized TFBS due to drift or co-evolution.
Collapse
|
48
|
Moritz RLV, Bernt M, Middendorf M. Local similarity search to find gene indicators in mitochondrial genomes. BIOLOGY 2014; 3:220-242. [PMID: 24833343 PMCID: PMC4009762 DOI: 10.3390/biology3010220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 02/15/2014] [Accepted: 02/18/2014] [Indexed: 06/03/2023]
Abstract
Given a set of nucleotide sequences we consider the problem of identifying conserved substrings occurring in homologous genes in a large number of sequences. The problem is solved by identifying certain nodes in a suffix tree containing all substrings occurring in the given nucleotide sequences. Due to the large size of the targeted data set, our approach employs a truncated version of suffix trees. Two methods for this task are introduced: (1) The annotation guided marker detection method uses gene annotations which might contain a moderate number of errors; (2) The probability based marker detection method determines sequences that appear significantly more often than expected. The approach is successfully applied to the mitochondrial nucleotide sequences, and the corresponding annotations that are available in RefSeq for 2989 metazoan species. We demonstrate that the approach finds appropriate substrings.
Collapse
Affiliation(s)
- Ruby L V Moritz
- Department of Computer Science, University of Leipzig, Postfach 100920, Leipzig D-04009, Germany.
| | - Matthias Bernt
- Department of Computer Science, University of Leipzig, Postfach 100920, Leipzig D-04009, Germany.
| | - Martin Middendorf
- Department of Computer Science, University of Leipzig, Postfach 100920, Leipzig D-04009, Germany.
| |
Collapse
|
49
|
FOGSAA: Fast Optimal Global Sequence Alignment Algorithm. Sci Rep 2014; 3:1746. [PMID: 23624407 PMCID: PMC3638164 DOI: 10.1038/srep01746] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2012] [Accepted: 04/15/2013] [Indexed: 11/13/2022] Open
Abstract
In this article we propose a Fast Optimal Global Sequence Alignment Algorithm, FOGSAA, which aligns a pair of nucleotide/protein sequences faster than any optimal global alignment method including the widely used Needleman-Wunsch (NW) algorithm. FOGSAA is applicable for all types of sequences, with any scoring scheme, and with or without affine gap penalty. Compared to NW, FOGSAA achieves a time gain of (70–90)% for highly similar nucleotide sequences (> 80% similarity), and (54–70)% for sequences having (30–80)% similarity. For other sequences, it terminates with an approximate score. For protein sequences, the average time gain is between (25–40)%. Compared to three heuristic global alignment methods, the quality of alignment is improved by about 23%–53%. FOGSAA is, in general, suitable for aligning any two sequences defined over a finite alphabet set, where the quality of the global alignment is of supreme importance.
Collapse
|
50
|
Abstract
The comparison of homologous proteins from different species is a first step toward a function assignment and a reconstruction of the species evolution. Though local alignment is mostly used for this purpose, global alignment is important for constructing multiple alignments or phylogenetic trees. However, statistical significance of global alignments is not completely clear, lacking a specific statistical model to describe alignments or depending on computationally expensive methods like Z-score. Recently we presented a normalized global alignment, defined as the best compromise between global alignment cost and length, and showed that this new technique led to better classification results than Z-score at a much lower computational cost. However, it is necessary to analyze the statistical significance of the normalized global alignment in order to be considered a completely functional algorithm for protein alignment. Experiments with unrelated proteins extracted from the SCOP ASTRAL database showed that normalized global alignment scores can be fitted to a log-normal distribution. This fact, obtained without any theoretical support, can be used to derive statistical significance of normalized global alignments. Results are summarized in a table with fitted parameters for different scoring schemes.
Collapse
Affiliation(s)
- Guillermo Peris
- Department de Llenguatges i Sistemes Informátics, Universitat Jaume I , Castelló, Spain
| | | |
Collapse
|