1
|
Raghuraman P, Ramireddy S, Raman G, Park S, Sudandiradoss C. Understanding a point mutation signature D54K in the caspase activation recruitment domain of NOD1 capitulating concerted immunity via atomistic simulation. J Biomol Struct Dyn 2025; 43:3766-3782. [PMID: 38415678 DOI: 10.1080/07391102.2024.2322618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 12/11/2023] [Indexed: 02/29/2024]
Abstract
Point mutation D54K in the human N-terminal caspase recruitment domain (CARD) of nucleotide-binding oligomerization domain -1 (NOD1) abrogates an imperative downstream interaction with receptor-interacting protein kinase (RIPK2) that entails combating bacterial infections and inflammatory dysfunction. Here, we addressed the molecular details concerning conformational changes and interaction patterns (monomeric-dimeric states) of D54K by signature-based molecular dynamics simulation. Initially, the sequence analysis prioritized D54K as a pathogenic mutation, among other variants, based on a sequence signature. Since the mutation is highly conserved, we derived the distant ortholog to predict the sequence and structural similarity between native and mutant. This analysis showed the utility of 33 communal core residues associated with structural-functional preservation and variations, concurrently served to infer the cryptic hotspots Cys39, Glu53, Asp54, Glu56, Ile57, Leu74, and Lys78 determining the inter helical fold forming homodimers for putative receptor interaction. Subsequently, the atomistic simulations with free energy (MM/PB(GB)SA) calculations predicted structural alteration that takes place in the N-terminal mutant CARD where coils changed to helices (45 α3- L4-α4-L6- α683) in contrast to native (45T2-L4-α4-L6-T483). Likewise, the C-terminal helices 93T1-α7105 connected to the loops distorted compared to native 93α6-L7105 may result in conformational misfolding that promotes functional regulation and activation. These structural perturbations of D54K possibly destabilize the flexible adaptation of critical homotypic NOD1CARD-CARDRIPK2 interactions (α4Asp42-Arg488α5 and α6Phe86-Lys471α4) is consistent with earlier experimental reports. Altogether, our findings unveil the conformational plasticity of mutation-dependent immunomodulatory response and may aid in functional validation exploring clinical investigation on CARD-regulated immunotherapies to prevent systemic infection and inflammation.
Collapse
Affiliation(s)
- P Raghuraman
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea
| | - Sriroopreddy Ramireddy
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
- Department of Genetics and Molecular Biology, School of Health Sciences, The Apollo University, Chittoor, India
| | - Gurusamy Raman
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea
| | - SeonJoo Park
- Department of Life Sciences, Yeungnam University, Gyeongsan, Gyeongsangbuk-do, Republic of Korea
| | - C Sudandiradoss
- Department of Biotechnology, School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
2
|
Chen Y, Zhang T, Xian M, Zhang R, Yang W, Su B, Yang G, Sun L, Xu W, Xu S, Gao H, Xu L, Gao X, Li J. A draft genome of Drung cattle reveals clues to its chromosomal fusion and environmental adaptation. Commun Biol 2022; 5:353. [PMID: 35418663 PMCID: PMC9008013 DOI: 10.1038/s42003-022-03298-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 03/21/2022] [Indexed: 12/02/2022] Open
Abstract
Drung cattle (Bos frontalis) have 58 chromosomes, differing from the Bos taurus 2n = 60 karyotype. To date, its origin and evolution history have not been proven conclusively, and the mechanisms of chromosome fusion and environmental adaptation have not been clearly elucidated. Here, we assembled a high integrity and good contiguity genome of Drung cattle with 13.7-fold contig N50 and 4.1-fold scaffold N50 improvements over the recently published Indian mithun assembly, respectively. Speciation time estimation and phylogenetic analysis showed that Drung cattle diverged from Bos taurus into an independent evolutionary clade. Sequence evidence of centromere regions provides clues to the breakpoints in BTA2 and BTA28 centromere satellites. We furthermore integrated a circulation and contraction-related biological process involving 43 evolutionary genes that participated in pathways associated with the evolution of the cardiovascular system. These findings may have important implications for understanding the molecular mechanisms of chromosome fusion, alpine valleys adaptability and cardiovascular function.
Collapse
Affiliation(s)
- Yan Chen
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, 100193, Beijing, P.R. China
| | - Tianliu Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, 100193, Beijing, P.R. China
| | - Ming Xian
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, 100193, Beijing, P.R. China
| | - Rui Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, 100193, Beijing, P.R. China
| | - Weifei Yang
- 1 Gene Co., Ltd, 310051, Hangzhou, P.R. China
- Annoroad Gene Technology (Beijing) Co., Ltd, 100176, Beijing, P.R. China
| | - Baqi Su
- Drung Cattle Conservation Farm in Jiudang Wood, Drung and Nu Minority Autonomous County, Gongshan, 673500, Kunming, Yunnan, P.R. China
| | - Guoqiang Yang
- Livestock and Poultry Breed Improvement Center, Nujiang Lisu Minority Autonomous Prefecture, 673199, Kunming, Yunnan, P.R. China
| | - Limin Sun
- Yunnan Animal Husbandry Service, 650224, Kunming, Yunnan, P.R. China
| | - Wenkun Xu
- Yunnan Animal Husbandry Service, 650224, Kunming, Yunnan, P.R. China
| | - Shangzhong Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, 100193, Beijing, P.R. China
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, 100193, Beijing, P.R. China
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, 100193, Beijing, P.R. China
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, 100193, Beijing, P.R. China.
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences, 100193, Beijing, P.R. China.
| |
Collapse
|
3
|
Song B, Buckler ES, Wang H, Wu Y, Rees E, Kellogg EA, Gates DJ, Khaipho-Burch M, Bradbury PJ, Ross-Ibarra J, Hufford MB, Romay MC. Conserved noncoding sequences provide insights into regulatory sequence and loss of gene expression in maize. Genome Res 2021; 31:1245-1257. [PMID: 34045362 PMCID: PMC8256870 DOI: 10.1101/gr.266528.120] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 05/21/2021] [Indexed: 01/16/2023]
Abstract
Thousands of species will be sequenced in the next few years; however, understanding how their genomes work, without an unlimited budget, requires both molecular and novel evolutionary approaches. We developed a sensitive sequence alignment pipeline to identify conserved noncoding sequences (CNSs) in the Andropogoneae tribe (multiple crop species descended from a common ancestor ∼18 million years ago). The Andropogoneae share similar physiology while being tremendously genomically diverse, harboring a broad range of ploidy levels, structural variation, and transposons. These contribute to the potential of Andropogoneae as a powerful system for studying CNSs and are factors we leverage to understand the function of maize CNSs. We found that 86% of CNSs were comprised of annotated features, including introns, UTRs, putative cis-regulatory elements, chromatin loop anchors, noncoding RNA (ncRNA) genes, and several transposable element superfamilies. CNSs were enriched in active regions of DNA replication in the early S phase of the mitotic cell cycle and showed different DNA methylation ratios compared to the genome-wide background. More than half of putative cis-regulatory sequences (identified via other methods) overlapped with CNSs detected in this study. Variants in CNSs were associated with gene expression levels, and CNS absence contributed to loss of gene expression. Furthermore, the evolution of CNSs was associated with the functional diversification of duplicated genes in the context of maize subgenomes. Our results provide a quantitative understanding of the molecular processes governing the evolution of CNSs in maize.
Collapse
Affiliation(s)
- Baoxing Song
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
| | - Edward S Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853, USA
- Agricultural Research Service, United States Department of Agriculture, Ithaca, New York 14853, USA
| | - Hai Wang
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
- National Maize Improvement Center, Key Laboratory of Crop Heterosis and Utilization, Joint Laboratory for International Cooperation in Crop Molecular Breeding, China Agricultural University, Beijing 100193, China
| | - Yaoyao Wu
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Evan Rees
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853, USA
| | | | - Daniel J Gates
- Department of Evolution and Ecology, University of California Davis, Davis, California 95616, USA
| | - Merritt Khaipho-Burch
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Peter J Bradbury
- Agricultural Research Service, United States Department of Agriculture, Ithaca, New York 14853, USA
| | - Jeffrey Ross-Ibarra
- Department of Evolution and Ecology, University of California Davis, Davis, California 95616, USA
- Center for Population Biology and Genome Center, University of California Davis, Davis, California 95616, USA
| | - Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa 50011, USA
| | - M Cinta Romay
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
4
|
Hamada M, Ono Y, Asai K, Frith MC. Training alignment parameters for arbitrary sequencers with LAST-TRAIN. Bioinformatics 2017; 33:926-928. [PMID: 28039163 PMCID: PMC5351549 DOI: 10.1093/bioinformatics/btw742] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 11/18/2016] [Indexed: 01/05/2023] Open
Abstract
Summary LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads. Availability and Implementation the source code is freely available at http://last.cbrc.jp/. Contact mhamada@waseda.jp or mcfrith@edu.k.u-tokyo.ac.jp. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan
| | | | - Kiyoshi Asai
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8562, Japan
| | - Martin C Frith
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 135-0064, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8562, Japan
| |
Collapse
|
5
|
Genome analysis of Taraxacum kok-saghyz Rodin provides new insights into rubber biosynthesis. Natl Sci Rev 2017. [DOI: 10.1093/nsr/nwx101] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
|
6
|
Yu L, Wang GD, Ruan J, Chen YB, Yang CP, Cao X, Wu H, Liu YH, Du ZL, Wang XP, Yang J, Cheng SC, Zhong L, Wang L, Wang X, Hu JY, Fang L, Bai B, Wang KL, Yuan N, Wu SF, Li BG, Zhang JG, Yang YQ, Zhang CL, Long YC, Li HS, Yang JY, Irwin DM, Ryder OA, Li Y, Wu CI, Zhang YP. Genomic analysis of snub-nosed monkeys (Rhinopithecus) identifies genes and processes related to high-altitude adaptation. Nat Genet 2016; 48:947-52. [DOI: 10.1038/ng.3615] [Citation(s) in RCA: 104] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2016] [Accepted: 06/13/2016] [Indexed: 12/31/2022]
|
7
|
Frith MC, Kawaguchi R. Split-alignment of genomes finds orthologies more accurately. Genome Biol 2015; 16:106. [PMID: 25994148 PMCID: PMC4464727 DOI: 10.1186/s13059-015-0670-9] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 05/08/2015] [Indexed: 04/29/2023] Open
Abstract
We present a new pair-wise genome alignment method, based on a simple concept of finding an optimal set of local alignments. It gains accuracy by not masking repeats, and by using a statistical model to quantify the (un)ambiguity of each alignment part. Compared to previous animal genome alignments, it aligns thousands of locations differently and with much higher similarity, strongly suggesting that the previous alignments are non-orthologous. The previous methods suffer from an overly-strong assumption of long un-rearranged blocks. The new alignments should help find interesting and unusual features, such as fast-evolving elements and micro-rearrangements, which are confounded by alignment errors.
Collapse
Affiliation(s)
- Martin C Frith
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
| | - Risa Kawaguchi
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan. .,Department of Computational Biology, Faculty of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan.
| |
Collapse
|
8
|
Abstract
BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.
Collapse
Affiliation(s)
- William R Pearson
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, USA
| |
Collapse
|
9
|
Leptidis S, el Azzouzi H, Lok SI, de Weger R, Olieslagers S, Kisters N, Silva GJ, Heymans S, Cuppen E, Berezikov E, De Windt LJ, da Costa Martins P. A deep sequencing approach to uncover the miRNOME in the human heart. PLoS One 2013; 8:e57800. [PMID: 23460909 PMCID: PMC3583901 DOI: 10.1371/journal.pone.0057800] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 01/29/2013] [Indexed: 12/31/2022] Open
Abstract
MicroRNAs (miRNAs) are a class of non-coding RNAs of ∼22 nucleotides in length, and constitute a novel class of gene regulators by imperfect base-pairing to the 3′UTR of protein encoding messenger RNAs. Growing evidence indicates that miRNAs are implicated in several pathological processes in myocardial disease. The past years, we have witnessed several profiling attempts using high-density oligonucleotide array-based approaches to identify the complete miRNA content (miRNOME) in the healthy and diseased mammalian heart. These efforts have demonstrated that the failing heart displays differential expression of several dozens of miRNAs. While the total number of experimentally validated human miRNAs is roughly two thousand, the number of expressed miRNAs in the human myocardium remains elusive. Our objective was to perform an unbiased assay to identify the miRNOME of the human heart, both under physiological and pathophysiological conditions. We used deep sequencing and bioinformatics to annotate and quantify microRNA expression in healthy and diseased human heart (heart failure secondary to hypertrophic or dilated cardiomyopathy). Our results indicate that the human heart expresses >800 miRNAs, the majority of which not being annotated nor described so far and some of which being unique to primate species. Furthermore, >250 miRNAs show differential and etiology-dependent expression in human dilated cardiomyopathy (DCM) or hypertrophic cardiomyopathy (HCM). The human cardiac miRNOME still possesses a large number of miRNAs that remain virtually unexplored. The current study provides a starting point for a more comprehensive understanding of the role of miRNAs in regulating human heart disease.
Collapse
Affiliation(s)
- Stefanos Leptidis
- Department of Cardiology, CARIM School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Hamid el Azzouzi
- Department of Cardiology, CARIM School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Sjoukje I. Lok
- Department of Pathology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Roel de Weger
- Department of Pathology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Serv Olieslagers
- Department of Cardiology, CARIM School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Natasja Kisters
- Department of Cardiology, CARIM School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Gustavo J. Silva
- Department of Cardiology, CARIM School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Stephane Heymans
- Department of Cardiology, CARIM School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Edwin Cuppen
- Hubrecht Institute, Royal Netherlands Academy of Sciences, Utrecht, The Netherlands
| | - Eugene Berezikov
- Hubrecht Institute, Royal Netherlands Academy of Sciences, Utrecht, The Netherlands
| | - Leon J. De Windt
- Department of Cardiology, CARIM School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Paula da Costa Martins
- Department of Cardiology, CARIM School for Cardiovascular Diseases, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
- * E-mail:
| |
Collapse
|
10
|
Abstract
Motivation: With improved short-read assembly algorithms and the recent development of long-read sequencers, split mapping will soon be the preferred method for structural variant (SV) detection. Yet, current alignment tools are not well suited for this. Results: We present YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints. Availability: YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA. Contact:imh4y@virginia.edu
Collapse
Affiliation(s)
- Gregory G Faust
- Department of Computer Science, University of Virginia, Charlottesville, VA 22908, USA
| | | |
Collapse
|
11
|
Differential impact of the HEN1 homolog HENN-1 on 21U and 26G RNAs in the germline of Caenorhabditis elegans. PLoS Genet 2012; 8:e1002702. [PMID: 22829772 PMCID: PMC3400576 DOI: 10.1371/journal.pgen.1002702] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Accepted: 02/21/2012] [Indexed: 02/07/2023] Open
Abstract
RNA interference (RNAi)–related pathways affect gene activity by sequence-specific recruitment of Ago proteins to mRNA target molecules. The sequence specificity of this process stems from small RNA (sRNA) co-factors bound by the Ago protein. Stability of sRNA molecules in some pathways is in part regulated by Hen1-mediated methylation of their 3′ ends. Here we describe the effects of the Caenorhabditis elegans HEN1 RNA–methyl-transferase homolog, HENN-1, on the different RNAi pathways in this nematode. We reveal differential effects of HENN-1 on the two pathways that are known to employ methylated sRNA molecules: the 26G and 21U pathways. Surprisingly, in the germline, stability of 21U RNAs, the C. elegans piRNAs, is only mildly affected by loss of methylation; and introduction of artificial 21U target RNA does not further destabilize non-methylated 21U RNAs. In contrast, most 26G RNAs display reduced stability and respond to loss of HENN-1 by displaying increased 3′-uridylation frequencies. Within the 26G RNA class, we find that specifically ERGO-1–bound 26G RNAs are modified by HENN-1, while ALG-3/ALG-4–bound 26G RNAs are not. Global gene expression analysis of henn-1 mutants reveals mild effects, including down-regulation of many germline-expressed genes. Our data suggest that, apart from direct effects of reduced 26G RNA levels of henn-1 on gene expression, most effects on global gene expression are indirect. These studies further refine our understanding of endogenous RNAi in C. elegans and the roles for Hen1 like enzymes in these pathways. Small RNAs (sRNAs) have been shown to be potent regulators of gene expression in many different systems. They act by providing sequence specificity to Argonaute (Ago) proteins that in turn affect the expression and/or stability of mRNAs, or affect chromatin structures through recognition of nascent transcripts. Stability of sRNAs can be regulated by methylation of their 3′ end. This modification prevents addition of uridine residues that can destabilize the sRNA. The enzyme that catalyzes the methylation of sRNAs has been identified in Arabidopsis: HEN1. We describe studies on the C. elegans homolog of Hen1, henn-1. Our findings show that HENN-1 protein does not stably associate with the Ago proteins binding methylated sRNAs, but that HENN-1 does localize to subcellular regions known to host these factors. We find that the two known methylated sRNA species in C. elegans (21U and 26G) respond differently to loss of henn-1. While HENN-1 is required for 26G RNA stability in the germline, it has limited impact on 21U RNAs. In addition, we demonstrate that only ERGO-1–bound 26G RNAs are methylated, while those bound by ALG-3/4, are not. Our findings further refine the general understanding of 21U and 26G RNA pathways and identify two separable effects of HENN-1 on these RNAi–related mechanisms.
Collapse
|
12
|
Abstract
Background Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. Results We present here the local pairwise aligner STELLAR that has full sensitivity for ε-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. Conclusions STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at http://www.seqan.de/projects/stellar. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at http://www.seqan.de.
Collapse
|
13
|
Nakato R, Gotoh O. Cgaln: fast and space-efficient whole-genome alignment. BMC Bioinformatics 2010; 11:224. [PMID: 20433723 PMCID: PMC2873541 DOI: 10.1186/1471-2105-11-224] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2010] [Accepted: 04/30/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whole-genome sequence alignment is an essential process for extracting valuable information about the functions, evolution, and peculiarities of genomes under investigation. As available genomic sequence data accumulate rapidly, there is great demand for tools that can compare whole-genome sequences within practical amounts of time and space. However, most existing genomic alignment tools can treat sequences that are only a few Mb long at once, and no state-of-the-art alignment program can align large sequences such as mammalian genomes directly on a conventional standalone computer. RESULTS We previously proposed the CGAT (Coarse-Grained AlignmenT) algorithm, which performs an alignment job in two steps: first at the block level and then at the nucleotide level. The former is "coarse-grained" alignment that can explore genomic rearrangements and reduce the sizes of the regions to be analyzed in the next step. The latter is detailed alignment within limited regions. In this paper, we present an update of the algorithm and the open-source program, Cgaln, that implements the algorithm. We compared the performance of Cgaln with those of other programs on whole genomic sequences of several bacteria and of some mammalian chromosome pairs. The results showed that Cgaln is several times faster and more memory-efficient than the best existing programs, while its sensitivity and accuracy are comparable to those of the best programs. Cgaln takes less than 13 hours to finish an alignment between the whole genomes of human and mouse in a single run on a conventional desktop computer with a single CPU and 2 GB memory. CONCLUSIONS Cgaln is not only fast and memory efficient but also effective in coping with genomic rearrangements. Our results show that Cgaln is very effective for comparison of large genomes, especially of intact chromosomal sequences. We believe that Cgaln provides novel viewpoint for reducing computational complexity and will contribute to various fields of genome science.
Collapse
Affiliation(s)
- Ryuichiro Nakato
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto-shi, Kyoto 606-8501, Japan
| | | |
Collapse
|
14
|
de Wit E, Linsen SEV, Cuppen E, Berezikov E. Repertoire and evolution of miRNA genes in four divergent nematode species. Genome Res 2009; 19:2064-74. [PMID: 19755563 DOI: 10.1101/gr.093781.109] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
miRNAs are approximately 22-nt RNA molecules that play important roles in post-transcriptional regulation. We have performed small RNA sequencing in the nematodes Caenorhabditis elegans, C. briggsae, C. remanei, and Pristionchus pacificus, which have diverged up to 400 million years ago, to establish the repertoire and evolutionary dynamics of miRNAs in these species. In addition to previously known miRNA genes from C. elegans and C. briggsae we demonstrate expression of many of their homologs in C. remanei and P. pacificus, and identified in total more than 100 novel expressed miRNA genes, the majority of which belong to P. pacificus. Interestingly, more than half of all identified miRNA genes are conserved at the seed level in all four nematode species, whereas only a few miRNAs appear to be species specific. In our compendium of miRNAs we observed evidence for known mechanisms of miRNA evolution including antisense transcription and arm switching, as well as miRNA family expansion through gene duplication. In addition, we identified a novel mode of miRNA evolution, termed "hairpin shifting," in which an alternative hairpin is formed with up- or downstream sequences, leading to shifting of the hairpin and creation of novel miRNA* species. Finally, we identified 21U-RNAs in all four nematodes, including P. pacificus, where the upstream 21U-RNA motif is more diverged. The identification and systematic analysis of small RNA repertoire in four nematode species described here provides a valuable resource for understanding the evolutionary dynamics of miRNA-mediated gene regulation.
Collapse
Affiliation(s)
- Elzo de Wit
- Hubrecht Institute-KNAW and University Medical Center Utrecht, Cancer Genomics Center, Utrecht 3584 CT, The Netherlands
| | | | | | | |
Collapse
|
15
|
Wolfsberg TG, Madden TL. Sequence similarity searching using the BLAST family of programs. ACTA ACUST UNITED AC 2008; Chapter 19:Unit 19.3. [PMID: 18265177 DOI: 10.1002/0471142727.mb1903s46] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Database sequence similarity searching is carried out thousands of times each day by researchers worldwide and has become a very valuable tool. Over the years, a number of algorithms have been implemented to facilitate database searching. The BLAST (Basic Local Alignment Research Tool) family of sequence similarity search programs allows searches to be done quickly and easily, but with sensitive, yet rigorous statistical expectations. In this unit, which is a completely new version of its predecessor of the same title, the user learns how to access the databases, determine the correct searching strategies, and apply examples of BLAST searches to his or her own data.
Collapse
Affiliation(s)
- T G Wolfsberg
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, Maryland, USA
| | | |
Collapse
|
16
|
Cameron M, Williams HE. Comparing compressed sequences for faster nucleotide BLAST searches. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007; 4:349-64. [PMID: 17666756 DOI: 10.1109/tcbb.2007.1029] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Molecular biologists, geneticists, and other life scientists use the BLAST homology search package as their first step for discovery of information about unknown or poorly annotated genomic sequences. There are two main variants of BLAST: BLASTP for searching protein collections and BLASTN for nucleotide collections. Surprisingly, BLASTN has had very little attention; for example, the algorithms it uses do not follow those described in the 1997 BLAST paper and no exact description has been published. It is important that BLASTN is state-of-the-art: Nucleotide collections such as GenBank dwarf the protein collections in size, they double in size almost yearly, and they take many minutes to search on modern general purpose workstations. This paper proposes significant improvements to the BLASTN algorithms. Each of our schemes is based on compressed bytepacked formats that allow queries and collection sequences to be compared four bases at a time, permitting very fast query evaluation using lookup tables and numeric comparisons. Our most significant innovations are two new, fast gapped alignment schemes that allow accurate sequence alignment without decompression of the collection sequences. Overall, our innovations more than double the speed of BLASTN with no effect on accuracy and have been integrated into our new version of BLAST that is freely available for download from http://www.fsa-blast.org/.
Collapse
Affiliation(s)
- Michael Cameron
- School of Computer Science and Information Technology, RMIT University, Melbourne, Australia.
| | | |
Collapse
|
17
|
Gertz EM, Yu YK, Agarwala R, Schäffer AA, Altschul SF. Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST. BMC Biol 2006; 4:41. [PMID: 17156431 PMCID: PMC1779365 DOI: 10.1186/1741-7007-4-41] [Citation(s) in RCA: 388] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2006] [Accepted: 12/07/2006] [Indexed: 11/29/2022] Open
Abstract
Background TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server. Results We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy. Conclusion TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms.
Collapse
Affiliation(s)
- E Michael Gertz
- National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | - Richa Agarwala
- National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | - Alejandro A Schäffer
- National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| | - Stephen F Altschul
- National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, USA
| |
Collapse
|
18
|
Berman P, Bertone P, Dasgupta B, Gerstein M, Kao MY, Snyder M. Fast optimal genome tiling with applications to microarray design and homology search. J Comput Biol 2005; 11:766-85. [PMID: 15579244 DOI: 10.1089/cmb.2004.11.766] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In this paper, we consider several variations of the following basic tiling problem: given a sequence of real numbers with two size-bound parameters, we want to find a set of tiles of maximum total weight such that each tiles satisfies the size bounds. A solution to this problem is important to a number of computational biology applications such as selecting genomic DNA fragments for PCR-based amplicon microarrays and performing homology searches with long sequence queries. Our goal is to design efficient algorithms with linear or near-linear time and space in the normal range of parameter values for these problems. For this purpose, we first discuss the solution to a basic online interval maximum problem via a sliding-window approach and show how to use this solution in a nontrivial manner for many of the tiling problems introduced. We also discuss NP-hardness results and approximation algorithms for generalizing our basic tiling problem to higher dimensions. Finally, computational results from applying our tiling algorithms to genomic sequences of five model eukaryotes are reported.
Collapse
Affiliation(s)
- Piotr Berman
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA
| | | | | | | | | | | |
Collapse
|
19
|
Cameron M, Williams HE, Cannane A. Improved gapped alignment in BLAST. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2004; 1:116-29. [PMID: 17048387 DOI: 10.1109/tcbb.2004.32] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Homology search is a key tool for understanding the role, structure, and biochemical function of genomic sequences. The most popular technique for rapid homology search is BLAST, which has been in widespread use within universities, research centers, and commercial enterprises since the early 1990s. In this paper, we propose a new step in the BLAST algorithm to reduce the computational cost of searching with negligible effect on accuracy. This new step-semigapped alignment-compromises between the efficiency of ungapped alignment and the accuracy of gapped alignment, allowing BLAST to accurately filter sequences with lower computational cost. In addition, we propose a heuristic-restricted insertion alignment-that avoids unlikely evolutionary paths with the aim of reducing gapped alignment cost with negligible effect on accuracy. Together, after including an optimization of the local alignment recursion, our two techniques more than double the speed of the gapped alignment stages in BLAST. We conclude that our techniques are an important improvement to the BLAST algorithm. Source code for the alignment algorithms is available for download at http://www.bsg.rmit.edu.au/iga/.
Collapse
Affiliation(s)
- Michael Cameron
- School of Computer Science and Information Technology, RMIT University, GPO Box 2476V, Melbourne, Australia.
| | | | | |
Collapse
|
20
|
Santini S, Boore JL, Meyer A. Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters. Genome Res 2003; 13:1111-22. [PMID: 12799348 PMCID: PMC403639 DOI: 10.1101/gr.700503] [Citation(s) in RCA: 115] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Comparisons of DNA sequences among evolutionarily distantly related genomes permit identification of conserved functional regions in noncoding DNA. Hox genes are highly conserved in vertebrates, occur in clusters, and are uninterrupted by other genes. We aligned (PipMaker) the nucleotide sequences of the HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human, and mouse, which are separated by approximately 500 million years of evolution. In support of our approach, several identified putative regulatory elements known to regulate the expression of Hox genes were recovered. The majority of the newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac database). The regulatory intergenic regions located between the genes that are expressed most anteriorly in the embryo are longer and apparently more evolutionarily conserved than those at the other end of Hox clusters. Different presumed regulatory sequences are retained in either the Aalpha or Abeta duplicated Hox clusters in the fish lineages. This suggests that the conserved elements are involved in different gene regulatory networks and supports the duplication-deletion-complementation model of functional divergence of duplicated genes.
Collapse
Affiliation(s)
- Simona Santini
- Department of Biology, University of Konstanz, 78457 Konstanz, Germany
| | | | | |
Collapse
|
21
|
Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W. PipMaker--a web server for aligning two genomic DNA sequences. Genome Res 2000; 10:577-86. [PMID: 10779500 PMCID: PMC310868 DOI: 10.1101/gr.10.4.577] [Citation(s) in RCA: 849] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/1999] [Accepted: 02/01/2000] [Indexed: 11/25/2022]
Abstract
PipMaker (http://bio.cse.psu.edu) is a World-Wide Web site for comparing two long DNA sequences to identify conserved segments and for producing informative, high-resolution displays of the resulting alignments. One display is a percent identity plot (pip), which shows both the position in one sequence and the degree of similarity for each aligning segment between the two sequences in a compact and easily understandable form. Positions along the horizontal axis can be labeled with features such as exons of genes and repetitive elements, and colors can be used to clarify and enhance the display. The web site also provides a plot of the locations of those segments in both species (similar to a dot plot). PipMaker is appropriate for comparing genomic sequences from any two related species, although the types of information that can be inferred (e.g., protein-coding regions and cis-regulatory elements) depend on the level of conservation and the time and divergence rate since the separation of the species. Gene regulatory elements are often detectable as similar, noncoding sequences in species that diverged as much as 100-300 million years ago, such as humans and mice, Caenorhabditis elegans and C. briggsae, or Escherichia coli and Salmonella spp. PipMaker supports analysis of unfinished or "working draft" sequences by permitting one of the two sequences to be in unoriented and unordered contigs.
Collapse
Affiliation(s)
- S Schwartz
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park 16802, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Abstract
In database searches for sequence similarity, matches to a distinct sequence region (e.g., protein domain) are frequently obscured by numerous matches to another region of the same sequence. In order to cope with this problem, algorithms are developed to discard redundant matches. One model for this problem begins with a list of intervals, each with an associated score; each interval gives the range of positions in the query sequence that align to a database sequence, and the score is that of the alignment. If interval I is contained in interval J, and I's score is less than J's, then I is said to be dominated by J. The problem is then to identify each interval that is dominated by at least K other intervals, where K is a given level of "tolerable redundancy." An algorithm is developed to solve the problem in O(N log N) time and O(N*) space, where N is the number of intervals and N* is a precisely defined value that never exceeds N and is frequently much smaller. This criterion for discarding database hits has been implemented in the Blast program, as illustrated herein with examples. Several variations and extensions of this approach are also described.
Collapse
Affiliation(s)
- P Berman
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park 16802, USA
| | | | | | | | | |
Collapse
|
23
|
Abstract
For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.
Collapse
Affiliation(s)
- Z Zhang
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park 16802, USA
| | | | | | | |
Collapse
|
24
|
Wolfsberg TG, Madden TL. Sequence Similarity Searching Using the
BLAST
Family of Programs. ACTA ACUST UNITED AC 1999; Chapter 2:Unit2.5. [DOI: 10.1002/0471140864.ps0205s15] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Tyra G. Wolfsberg
- National Center for Biotechnology Information, National Library of Medicine, NIH Bethesda Maryland
| | - Thomas L. Madden
- National Center for Biotechnology Information, National Library of Medicine, NIH Bethesda Maryland
| |
Collapse
|
25
|
Zhang Z, Schäffer AA, Miller W, Madden TL, Lipman DJ, Koonin EV, Altschul SF. Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res 1998; 26:3986-90. [PMID: 9705509 PMCID: PMC147803 DOI: 10.1093/nar/26.17.3986] [Citation(s) in RCA: 224] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein families often are characterized by conserved sequence patterns or motifs. A researcher frequently wishes to evaluate the significance of a specific pattern within a protein, or to exploit knowledge of known motifs to aid the recognition of greatly diverged but homologous family members. To assist in these efforts, the pattern-hit initiated BLAST (PHI-BLAST) program described here takes as input both a protein sequence and a pattern of interest that it contains. PHI-BLAST searches a protein database for other instances of the input pattern, and uses those found as seeds for the construction of local alignments to the query sequence. The random distribution of PHI-BLAST alignment scores is studied analytically and empirically. In many instances, the program is able to detect statistically significant similarity between homologous proteins that are not recognizably related using traditional single-pass database search methods. PHI-BLAST is applied to the analysis of CED4-like cell death regulators, HS90-type ATPase domains, archaeal tRNA nucleotidyltransferases and archaeal homologs of DnaG-type DNA primases.
Collapse
Affiliation(s)
- Z Zhang
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA
| | | | | | | | | | | | | |
Collapse
|