1
|
Plasma microRNA signature in presymptomatic and symptomatic subjects with C9orf72-associated frontotemporal dementia and amyotrophic lateral sclerosis. J Neurol Neurosurg Psychiatry 2021; 92:485-493. [PMID: 33239440 PMCID: PMC8053348 DOI: 10.1136/jnnp-2020-324647] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 09/30/2020] [Accepted: 10/27/2020] [Indexed: 12/13/2022]
Abstract
OBJECTIVE To identify potential biomarkers of preclinical and clinical progression in chromosome 9 open reading frame 72 gene (C9orf72)-associated disease by assessing the expression levels of plasma microRNAs (miRNAs) in C9orf72 patients and presymptomatic carriers. METHODS The PREV-DEMALS study is a prospective study including 22 C9orf72 patients, 45 presymptomatic C9orf72 mutation carriers and 43 controls. We assessed the expression levels of 2576 miRNAs, among which 589 were above noise level, in plasma samples of all participants using RNA sequencing. The expression levels of the differentially expressed miRNAs between patients, presymptomatic carriers and controls were further used to build logistic regression classifiers. RESULTS Four miRNAs were differentially expressed between patients and controls: miR-34a-5p and miR-345-5p were overexpressed, while miR-200c-3p and miR-10a-3p were underexpressed in patients. MiR-34a-5p was also overexpressed in presymptomatic carriers compared with healthy controls, suggesting that miR-34a-5p expression is deregulated in cases with C9orf72 mutation. Moreover, miR-345-5p was also overexpressed in patients compared with presymptomatic carriers, which supports the correlation of miR-345-5p expression with the progression of C9orf72-associated disease. Together, miR-200c-3p and miR-10a-3p underexpression might be associated with full-blown disease. Four presymptomatic subjects in transitional/prodromal stage, close to the disease conversion, exhibited a stronger similarity with the expression levels of patients. CONCLUSIONS We identified a signature of four miRNAs differentially expressed in plasma between clinical conditions that have potential to represent progression biomarkers for C9orf72-associated frontotemporal dementia and amyotrophic lateral sclerosis. This study suggests that dysregulation of miRNAs is dynamically altered throughout neurodegenerative diseases progression, and can be detectable even long before clinical onset. TRIAL REGISTRATION NUMBER NCT02590276.
Collapse
|
2
|
Converting disease maps into heavyweight ontologies: general methodology and application to Alzheimer's disease. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6137817. [PMID: 33590873 DOI: 10.1093/database/baab004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 01/17/2021] [Accepted: 01/27/2021] [Indexed: 11/12/2022]
Abstract
Omics technologies offer great promises for improving our understanding of diseases. The integration and interpretation of such data pose major challenges, calling for adequate knowledge models. Disease maps provide curated knowledge about disorders' pathophysiology at the molecular level adapted to omics measurements. However, the expressiveness of disease maps could be increased to help in avoiding ambiguities and misinterpretations and to reinforce their interoperability with other knowledge resources. Ontology is an adequate framework to overcome this limitation, through their axiomatic definitions and logical reasoning properties. We introduce the Disease Map Ontology (DMO), an ontological upper model based on systems biology terms. We then propose to apply DMO to Alzheimer's disease (AD). Specifically, we use it to drive the conversion of AlzPathway, a disease map devoted to AD, into a formal ontology: Alzheimer DMO. We demonstrate that it allows one to deal with issues related to redundancy, naming, consistency, process classification and pathway relationships. Furthermore, we show that it can store and manage multi-omics data. Finally, we expand the model using elements from other resources, such as clinical features contained in the AD Ontology, resulting in an enriched model called ADMO-plus. The current versions of DMO, ADMO and ADMO-plus are freely available at http://bioportal.bioontology.org/ontologies/ADMO.
Collapse
|
3
|
A DNA methylation signature discriminates between excellent and non-response to lithium in patients with bipolar disorder type 1. Sci Rep 2020; 10:12239. [PMID: 32699220 PMCID: PMC7376060 DOI: 10.1038/s41598-020-69073-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 07/03/2020] [Indexed: 12/15/2022] Open
Abstract
Lithium (Li) is the cornerstone maintenance treatment for bipolar disorders (BD), but response rates are highly variable. To date, no clinical or biological marker is available to reliably define eligibility criteria for a maintenance treatment with Li. We examined whether the prophylactic response to Li (assessed retrospectively) is associated with distinct blood DNA methylation profiles. Bisulfite-treated total blood DNA samples from individuals with BD type 1 (15 excellent-responders (LiERs) versus 11 non-responders (LiNRs)) were used for targeted enrichment of CpG rich genomic regions followed by high-resolution next-generation sequencing to identify differentially methylated regions (DMRs). After controlling for potential confounders we identified 111 DMRs that significantly differ between LiERs and LiNRs with a significant enrichment in neuronal cell components. Logistic regression and receiver operating curves identified a combination of 7 DMRs with a good discriminatory power for response to Li (Area Under the Curve 0.806). Annotated genes associated with these DMRs include Eukaryotic Translation Initiation Factor 2B Subunit Epsilon (EIF2B5), Von Willebrand Factor A Domain Containing 5B2 (VWA5B2), Ral GTPase Activating Protein Catalytic Alpha Subunit 1 (RALGAPA1). Although preliminary and deserving replication, these results suggest that biomarkers of response to Li may be identified through peripheral epigenetic measures.
Collapse
|
4
|
Long non-coding RNA repertoire and open chromatin regions constitute midbrain dopaminergic neuron - specific molecular signatures. Sci Rep 2019; 9:1409. [PMID: 30723217 PMCID: PMC6363776 DOI: 10.1038/s41598-018-37872-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 12/12/2018] [Indexed: 01/24/2023] Open
Abstract
Midbrain dopaminergic (DA) neurons are involved in diverse neurological functions, including control of movements, emotions or reward. In turn, their dysfunctions cause severe clinical manifestations in humans, such as the appearance of motor and cognitive symptoms in Parkinson’s Disease. The physiology and pathophysiology of these neurons are widely studied, mostly with respect to molecular mechanisms implicating protein-coding genes. In contrast, the contribution of non-coding elements of the genome to DA neuron function is poorly investigated. In this study, we isolated DA neurons from E14.5 ventral mesencephalons in mice, and used RNA-seq and ATAC-seq to establish and describe repertoires of long non-coding RNAs (lncRNAs) and putative DNA regulatory regions specific to this neuronal population. We identified 1,294 lncRNAs constituting the repertoire of DA neurons, among which 939 were novel. Most of them were not found in hindbrain serotonergic (5-HT) neurons, indicating a high degree of cell-specificity. This feature was also observed regarding open chromatin regions, as 39% of the ATAC-seq peaks from the DA repertoire were not detected in the 5-HT neurons. Our work provides for the first time DA-specific catalogues of non-coding elements of the genome that will undoubtedly participate in deepening our knowledge regarding DA neuronal development and dysfunctions.
Collapse
|
5
|
Diet-Induced Dysbiosis and Genetic Background Synergize With Cystic Fibrosis Transmembrane Conductance Regulator Deficiency to Promote Cholangiopathy in Mice. Hepatol Commun 2018; 2:1533-1549. [PMID: 30556040 PMCID: PMC6287479 DOI: 10.1002/hep4.1266] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Accepted: 09/19/2018] [Indexed: 02/06/2023] Open
Abstract
The most typical expression of cystic fibrosis (CF)-related liver disease is a cholangiopathy that can progress to cirrhosis. We aimed to determine the potential impact of environmental and genetic factors on the development of CF-related cholangiopathy in mice. Cystic fibrosis transmembrane conductance regulator (Cftr)-/- mice and Cftr +/+ littermates in a congenic C57BL/6J background were fed a high medium-chain triglyceride (MCT) diet. Liver histopathology, fecal microbiota, intestinal inflammation and barrier function, bile acid homeostasis, and liver transcriptome were analyzed in 3-month-old males. Subsequently, MCT diet was changed for chow with polyethylene glycol (PEG) and the genetic background for a mixed C57BL/6J;129/Ola background (resulting from three backcrosses), to test their effect on phenotype. C57BL/6J Cftr -/- mice on an MCT diet developed cholangiopathy features that were associated with dysbiosis, primarily Escherichia coli enrichment, and low-grade intestinal inflammation. Compared with Cftr +/+ littermates, they displayed increased intestinal permeability and a lack of secondary bile acids together with a low expression of ileal bile acid transporters. Dietary-induced (chow with PEG) changes in gut microbiota composition largely prevented the development of cholangiopathy in Cftr -/- mice. Regardless of Cftr status, mice in a mixed C57BL/6J;129/Ola background developed fatty liver under an MCT diet. The Cftr -/- mice in the mixed background showed no cholangiopathy, which was not explained by a difference in gut microbiota or intestinal permeability, compared with congenic mice. Transcriptomic analysis of the liver revealed differential expression, notably of immune-related genes, in mice of the congenic versus mixed background. In conclusion, our findings suggest that CFTR deficiency causes abnormal intestinal permeability, which, combined with diet-induced dysbiosis and immune-related genetic susceptibility, promotes CF-related cholangiopathy.
Collapse
|
6
|
A strategy for multimodal data integration: application to biomarkers identification in spinocerebellar ataxia. Brief Bioinform 2017; 19:1356-1369. [DOI: 10.1093/bib/bbx060] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2016] [Indexed: 11/14/2022] Open
|
7
|
Clinical-genetic model predicts incident impulse control disorders in Parkinson's disease. J Neurol Neurosurg Psychiatry 2016; 87:1106-11. [PMID: 27076492 PMCID: PMC5098340 DOI: 10.1136/jnnp-2015-312848] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Accepted: 03/23/2016] [Indexed: 11/04/2022]
Abstract
OBJECTIVES Impulse control disorders (ICD) are commonly associated with dopamine replacement therapy (DRT) in patients with Parkinson's disease (PD). Our aims were to estimate ICD heritability and to predict ICD by a candidate genetic multivariable panel in patients with PD. METHODS Data from de novo patients with PD, drug-naïve and free of ICD behaviour at baseline, were obtained from the Parkinson's Progression Markers Initiative cohort. Incident ICD behaviour was defined as positive score on the Questionnaire for Impulsive-Compulsive Disorders in PD. ICD heritability was estimated by restricted maximum likelihood analysis on whole exome sequencing data. 13 candidate variants were selected from the DRD2, DRD3, DAT1, COMT, DDC, GRIN2B, ADRA2C, SERT, TPH2, HTR2A, OPRK1 and OPRM1 genes. ICD prediction was evaluated by the area under the curve (AUC) of receiver operating characteristic (ROC) curves. RESULTS Among 276 patients with PD included in the analysis, 86% started DRT, 40% were on dopamine agonists (DA), 19% reported incident ICD behaviour during follow-up. We found heritability of this symptom to be 57%. Adding genotypes from the 13 candidate variants significantly increased ICD predictability (AUC=76%, 95% CI (70% to 83%)) compared to prediction based on clinical variables only (AUC=65%, 95% CI (58% to 73%), p=0.002). The clinical-genetic prediction model reached highest accuracy in patients initiating DA therapy (AUC=87%, 95% CI (80% to 93%)). OPRK1, HTR2A and DDC genotypes were the strongest genetic predictive factors. CONCLUSIONS Our results show that adding a candidate genetic panel increases ICD predictability, suggesting potential for developing clinical-genetic models to identify patients with PD at increased risk of ICD development and guide DRT management.
Collapse
|
8
|
Abstract
Background The opportunistic pathogen Candida glabrata is a member of the Saccharomycetaceae yeasts. Like its close relative Saccharomyces cerevisiae, it underwent a whole-genome duplication followed by an extensive loss of genes. Its genome contains a large number of very long tandem repeats, called megasatellites. In order to determine the whole replication program of the C. glabrata genome and its general chromosomal organization, we used deep-sequencing and chromosome conformation capture experiments. Results We identified 253 replication fork origins, genome wide. Centromeres, HML and HMR loci, and most histone genes are replicated early, whereas natural chromosomal breakpoints are located in late-replicating regions. In addition, 275 autonomously replicating sequences (ARS) were identified during ARS-capture experiments, and their relative fitness was determined during growth competition. Analysis of ARSs allowed us to identify a 17-bp consensus, similar to the S. cerevisiae ARS consensus sequence but slightly more constrained. Megasatellites are not in close proximity to replication origins or termini. Using chromosome conformation capture, we also show that early origins tend to cluster whereas non-subtelomeric megasatellites do not cluster in the yeast nucleus. Conclusions Despite a shorter cell cycle, the C. glabrata replication program shares unexpected striking similarities to S. cerevisiae, in spite of their large evolutionary distance and the presence of highly repetitive large tandem repeats in C. glabrata. No correlation could be found between the replication program and megasatellites, suggesting that their formation and propagation might not be directly caused by replication fork initiation or termination. Electronic supplementary material The online version of this article (doi:10.1186/s12915-015-0177-6) contains supplementary material, which is available to authorized users.
Collapse
|
9
|
Streptococcus agalactiae clones infecting humans were selected and fixed through the extensive use of tetracycline. Nat Commun 2014; 5:4544. [PMID: 25088811 PMCID: PMC4538795 DOI: 10.1038/ncomms5544] [Citation(s) in RCA: 168] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2014] [Accepted: 06/27/2014] [Indexed: 11/17/2022] Open
Abstract
Streptococcus agalactiae (Group B Streptococcus, GBS) is a commensal of the digestive and genitourinary tracts of humans that emerged as the leading cause of bacterial neonatal infections in Europe and North America during the 1960s. Due to the lack of epidemiological and genomic data, the reasons for this emergence are unknown. Here we show by comparative genome analysis and phylogenetic reconstruction of 229 isolates that the rise of human GBS infections corresponds to the selection and worldwide dissemination of only a few clones. The parallel expansion of the clones is preceded by the insertion of integrative and conjugative elements conferring tetracycline resistance (TcR). Thus, we propose that the use of tetracycline from 1948 onwards led in humans to the complete replacement of a diverse GBS population by only few TcR clones particularly well adapted to their host, causing the observed emergence of GBS diseases in neonates.
Collapse
|
10
|
SynTView - an interactive multi-view genome browser for next-generation comparative microorganism genomics. BMC Bioinformatics 2013; 14:277. [PMID: 24053737 PMCID: PMC3849071 DOI: 10.1186/1471-2105-14-277] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Accepted: 09/16/2013] [Indexed: 12/31/2022] Open
Abstract
Background Dynamic visualisation interfaces are required to explore the multiple microbial genome data now available, especially those obtained by high-throughput sequencing — a.k.a. “Next-Generation Sequencing” (NGS) — technologies; they would also be useful for “standard” annotated genomes whose chromosome organizations may be compared. Although various software systems are available, few offer an optimal combination of feature-rich capabilities, non-static user interfaces and multi-genome data handling. Results We developed SynTView, a comparative and interactive viewer for microbial genomes, designed to run as either a web-based tool (Flash technology) or a desktop application (AIR environment). The basis of the program is a generic genome browser with sub-maps holding information about genomic objects (annotations). The software is characterised by the presentation of syntenic organisations of microbial genomes and the visualisation of polymorphism data (typically Single Nucleotide Polymorphisms — SNPs) along these genomes; these features are accessible to the user in an integrated way. A variety of specialised views are available and are all dynamically inter-connected (including linear and circular multi-genome representations, dot plots, phylogenetic profiles, SNP density maps, and more). SynTView is not linked to any particular database, allowing the user to plug his own data into the system seamlessly, and use external web services for added functionalities. SynTView has now been used in several genome sequencing projects to help biologists make sense out of huge data sets. Conclusions The most important assets of SynTView are: (i) the interactivity due to the Flash technology; (ii) the capabilities for dynamic interaction between many specialised views; and (iii) the flexibility allowing various user data sets to be integrated. It can thus be used to investigate massive amounts of information efficiently at the chromosome level. This innovative approach to data exploration could not be achieved with most existing genome browsers, which are more static and/or do not offer multiple views of multiple genomes. Documentation, tutorials and demonstration sites are available at the URL: http://genopole.pasteur.fr/SynTView.
Collapse
|
11
|
In silico comparison of Yersinia pestis and Yersinia pseudotuberculosis transcriptomes reveals a higher expression level of crucial virulence determinants in the plague bacillus. Int J Med Microbiol 2011; 301:105-16. [DOI: 10.1016/j.ijmm.2010.08.013] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2010] [Revised: 07/26/2010] [Accepted: 08/04/2010] [Indexed: 10/18/2022] Open
|
12
|
From gene regulation to gene function: regulatory networks in bacillus subtilis. Comp Funct Genomics 2010; 3:37-41. [PMID: 18628883 PMCID: PMC2447243 DOI: 10.1002/cfg.138] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2001] [Accepted: 12/06/2001] [Indexed: 11/30/2022] Open
Abstract
Bacillus subtilis is a sporulating Gram-positive bacterium that lives primarily in the soil
and associated water sources. The publication of the B. subtilis genome sequence and
subsequent systematic functional analysis and gene regulation programmes, together with
an extensive understanding of its biochemistry and physiology, makes this micro-organism
a prime candidate in which to model regulatory networks in silico. In this paper we discuss
combined molecular biological and bioinformatical approaches that are being developed to
model this organism’s responses to changes in its environment.
Collapse
|
13
|
Genoscape: a Cytoscape plug-in to automate the retrieval and integration of gene expression data and molecular networks. ACTA ACUST UNITED AC 2009; 25:2617-8. [PMID: 19654116 PMCID: PMC2752617 DOI: 10.1093/bioinformatics/btp464] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Summary: Genoscape is an open-source Cytoscape plug-in that visually integrates gene expression data sets from GenoScript, a transcriptomic database, and KEGG pathways into Cytoscape networks. The generated visualisation highlights gene expression changes and their statistical significance. The plug-in also allows one to browse GenoScript or import transcriptomic data from other sources through tab-separated text files. Genoscape has been successfully used by researchers to investigate the results of gene expression profiling experiments. Availability: Genoscape is an open-source software freely available from the Genoscape webpage (http://www.pasteur.fr/recherche/unites/Gim/genoscape/). Installation instructions and tutorial can also be found at this URL. Contact:Mathieu.clement-ziza@biotec.tu-dresden.de; sandrine.rousseau@pasteur.fr Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
14
|
From a consortium sequence to a unified sequence: the Bacillus subtilis 168 reference genome a decade later. MICROBIOLOGY (READING, ENGLAND) 2009; 155:1758-1775. [PMID: 19383706 PMCID: PMC2885750 DOI: 10.1099/mic.0.027839-0] [Citation(s) in RCA: 257] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2009] [Revised: 02/25/2009] [Accepted: 02/25/2009] [Indexed: 11/18/2022]
Abstract
Comparative genomics is the cornerstone of identification of gene functions. The immense number of living organisms precludes experimental identification of functions except in a handful of model organisms. The bacterial domain is split into large branches, among which the Firmicutes occupy a considerable space. Bacillus subtilis has been the model of Firmicutes for decades and its genome has been a reference for more than 10 years. Sequencing the genome involved more than 30 laboratories, with different expertises, in a attempt to make the most of the experimental information that could be associated with the sequence. This had the expected drawback that the sequencing expertise was quite varied among the groups involved, especially at a time when sequencing genomes was extremely hard work. The recent development of very efficient, fast and accurate sequencing techniques, in parallel with the development of high-level annotation platforms, motivated the present resequencing work. The updated sequence has been reannotated in agreement with the UniProt protein knowledge base, keeping in perspective the split between the paleome (genes necessary for sustaining and perpetuating life) and the cenome (genes required for occupation of a niche, suggesting here that B. subtilis is an epiphyte). This should permit investigators to make reliable inferences to prepare validation experiments in a variety of domains of bacterial growth and development as well as build up accurate phylogenies.
Collapse
|
15
|
Abstract
CandidaDB (http://genodb.pasteur.fr/CandidaDB) was established in 2002 to provide the first genomic database for the human fungal pathogen Candida albicans. The availability of an increasing number of fully or partially completed genome sequences of related fungal species has opened the path for comparative genomics and prompted us to migrate CandidaDB into a multi-genome database. The new version of CandidaDB houses the latest versions of the genomes of C. albicans strains SC5314 and WO-1 along with six genome sequences from species closely related to C. albicans that all belong to the CTG clade of Saccharomycotina—Candida tropicalis, Candida (Clavispora) lusitaniae, Candida (Pichia) guillermondii, Lodderomyces elongisporus, Debaryomyces hansenii, Pichia stipitis—and the reference Saccharomyces cerevisiae genome. CandidaDB includes sequences coding for 54 170 proteins with annotations collected from other databases, enriched with illustrations of structural features and functional domains and data of comparative analyses. In order to take advantage of the integration of multiple genomes in a unique database, new tools using pre-calculated or user-defined comparisons have been implemented that allow rapid access to comparative analysis at the genomic scale.
Collapse
|
16
|
Abstract
The multitude of bacterial genome sequences being determined has generated new requirements regarding the development of databases and graphical interfaces: these are needed to organize and retrieve biological information from the comparison of large sets of genomes. GenoList (http://genolist.pasteur.fr/GenoList) is an integrated environment dedicated to querying and analyzing genome data from bacterial species. GenoList inherits from the SubtiList database and web server, the reference data resource for the Bacillus subtilis genome. The data model was extended to hold information about relationships between genomes (e.g. protein families). The web user interface was designed to primarily take into account biologists’ needs and modes of operation. Along with standard query and browsing capabilities, comparative genomics facilities are available, including subtractive proteome analysis. One key feature is the integration of the many tools accessible in the environment. As an example, it is straightforward to identify the genes that are specific to a group of bacteria, export them as a tab-separated list, get their protein sequences and run a multiple alignment on a subset of these sequences.
Collapse
|
17
|
Bacillus subtilis genome project: cloning and sequencing of the 97 kb region from 325° to 333deg. Mol Microbiol 2006; 10:371-384. [PMID: 28776854 DOI: 10.1111/j.1365-2958.1993.tb01963.x] [Citation(s) in RCA: 144] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In the framework of the European project aimed at the sequencing of the Bacillus subtilis genome the DNA region located between gerB (314°) and sacXV (333°) was assigned to the Institut Pasteur. In this paper we describe the cloning and sequencing of a segment of 97 kb of contiguous DNA. Ninety-two open reading frames were predicted to encode putative proteins among which only forty-two were found to display significant similarities to known proteins present in databanks, e.g. amino acid permeases, proteins involved in cell wall or antibiotic biosynthesis, various regulatory proteins, proteins of several dehydrogenase families and enzymes II of the phosphotransferase system involved in sugar transport. Additional experiments led to the identification of the products of new B. subtilis genes, e.g. galactokinase and an operon involved in thiamine biosynthesis.
Collapse
|
18
|
Abstract
CandidaDB is a database dedicated to the genome of the most prevalent systemic fungal pathogen of humans, Candida albicans. CandidaDB is based on an annotation of the Stanford Genome Technology Center C.albicans genome sequence data by the European Galar Fungail Consortium. CandidaDB Release 2.0 (June 2004) contains information pertaining to Assembly 19 of the genome of C.albicans strain SC5314. The current release contains 6244 annotated entries corresponding to 130 tRNA genes and 5917 protein-coding genes. For these, it provides tentative functional assignments along with numerous pre-run analyses that can assist the researcher in the evaluation of gene function for the purpose of specific or large-scale analysis. CandidaDB is based on GenoList, a generic relational data schema and a World Wide Web interface that has been adapted to the handling of eukaryotic genomes. The interface allows users to browse easily through genome data and retrieve information. CandidaDB also provides more elaborate tools, such as pattern searching, that are tightly connected to the overall browsing system. As the C.albicans genome is diploid and still incompletely assembled, CandidaDB provides tools to browse the genome by individual supercontigs and to examine information about allelic sequences obtained from complementary contigs. CandidaDB is accessible at http://genolist.pasteur.fr/CandidaDB.
Collapse
|
19
|
Specialized microbial databases for inductive exploration of microbial genome sequences. BMC Genomics 2005; 6:14. [PMID: 15698474 PMCID: PMC549560 DOI: 10.1186/1471-2164-6-14] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2004] [Accepted: 02/07/2005] [Indexed: 11/10/2022] Open
Abstract
Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison.
Collapse
|
20
|
Abstract
Huge amounts of genomic information are currently being generated. Therefore, biologists require structured, exhaustive and comparative databases. The PyloriGene database (http://genolist.pasteur.fr/PyloriGene) was developed to respond to these needs, by integrating and connecting the information generated during the sequencing of two distinct strains of Helicobacter pylori. This led to the need for a general annotation consensus, as the physical and functional annotations of the two strains differed significantly in some cases. A revised functional classification system was created to accommodate the existing data and to make it possible to classify coding sequences (CDS) into several functional categories to harmonize CDS classification. The annotation of the two complete genomes was revised in the light of new data, allowing us to reduce the percentage of hypothetical proteins from approximately 40 to 33%. This resulted in the reassignment of functions for 108 CDS (approximately 7% of all CDS). Interestingly, the functions of only approximately 13% of CDS (222 out of 1658 CDS) were annotated as a result of work done directly on H.pylori genes. Finally, comparison of the two published genomes revealed a significant amount of size variation between corresponding (orthologous) CDS. Most of these size variations were due to natural polymorphisms, although other sources of variation were identified, such as pseudogenes, new genes potentially regulated by slipped-strand mispairing mechanism, or frame-shifts. 113 of these differences were due to different start codon assignments, a common problem when constructing physical annotations.
Collapse
|
21
|
Abstract
SubtiList is the reference database dedicated to the genome of Bacillus subtilis 168, the paradigm of Gram-positive endospore-forming bacteria. Developed in the framework of the B.subtilis genome project, SubtiList provides a curated dataset of DNA and protein sequences, combined with the relevant annotations and functional assignments. Information about gene functions and products is continuously updated by linking relevant bibliographic references. Recently, sequence corrections arising from both systematic verifications and submissions by individual scientists were included in the reference genome sequence. SubtiList is based on a generic relational data schema and a World Wide Web interface developed for the handling of bacterial genomes, called GenoList. The World Wide Web interface was designed to allow users to easily browse through genome data and retrieve information according to common biological queries. SubtiList also provides more elaborate tools, such as pattern searching, which are tightly connected to the overall browsing system. SubtiList is accessible at http://genolist.pasteur.fr/SubtiList/. Similar bacterial databases are accessible at http://genolist.pasteur.fr/.
Collapse
|
22
|
Leproma: a Mycobacterium leprae genome browser. LEPROSY REV 2001; 72:470-7. [PMID: 11826483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
|
23
|
Abstract
The spore coat protein CotA of Bacillus subtilis displays similarities with multicopper oxidases, including manganese oxidases and laccases. B. subtilis is able to oxidize manganese, but neither CotA nor other sporulation proteins are involved. We demonstrate that CotA is a laccase. Syringaldazine, a specific substrate of laccases, reacted with wild-type spores but not with DeltacotA spores. CotA may participate in the biosynthesis of the brown spore pigment, which appears to be a melanin-like product and to protect against UV light.
Collapse
|
24
|
The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis. Nucleic Acids Res 2001; 29:2145-53. [PMID: 11353084 PMCID: PMC55444 DOI: 10.1093/nar/29.10.2145] [Citation(s) in RCA: 208] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2001] [Revised: 03/19/2001] [Accepted: 03/19/2001] [Indexed: 11/14/2022] Open
Abstract
Mycoplasma pulmonis is a wall-less eubacterium belonging to the Mollicutes (trivial name, mycoplasmas) and responsible for murine respiratory diseases. The genome of strain UAB CTIP is composed of a single circular 963 879 bp chromosome with a G + C content of 26.6 mol%, i.e. the lowest reported among bacteria, Ureaplasma urealyticum apart. This genome contains 782 putative coding sequences (CDSs) covering 91.4% of its length and a function could be assigned to 486 CDSs whilst 92 matched the gene sequences of hypothetical proteins, leaving 204 CDSs without significant database match. The genome contains a single set of rRNA genes and only 29 tRNAs genes. The replication origin oriC was localized by sequence analysis and by using the G + C skew method. Sequence polymorphisms within stretches of repeated nucleotides generate phase-variable protein antigens whilst a recombinase gene is likely to catalyse the site-specific DNA inversions in major M.pulmonis surface antigens. Furthermore, a hemolysin, secreted nucleases and a glyco-protease are predicted virulence factors. Surprisingly, several of the genes previously reported to be essential for a self-replicating minimal cell are missing in the M.pulmonis genome although this one is larger than the other mycoplasma genomes fully sequenced until now.
Collapse
|
25
|
Abstract
As bacterial genome sequences accumulate, more and more pieces of data suggest that there is a significant correlation between the distribution of genes along the chromosome and the physical architecture of the cell, suggesting that the map of the cell is in the chromosome. Considering sequences and experimental data indicative of cell compartmentalisation, mRNA folding and turnover, as well as known structural features of protein and membrane complexes, we show that preliminary in silico analysis of whole genome sequences strongly substantiates this hypothesis. If there is a correlation between the genome sequence and the cell architecture, it must derive from some selection pressure in the organisms growing in the wild. As a consequence, the underlying constraints should be optimised in genetically modified organisms if one is to expect high product yields. Consequences in terms of gene expression for biotechnology are straightforward: knocking genes out and in genomes should not be randomly performed, but should follow the rules of chromosome organisation.
Collapse
|
26
|
Abstract
A genome is not a simple collection of genes. We propose here that it can be viewed as being organized as a 'celluloculus' similar to the homunculus of preformists, but pertaining to the category of programmes (or algorithms) rather than to that of architectures or structures: a significant correlation exists between the distribution of genes along the chromosome and the physical architecture of the cell. We review here data supporting this observation, stressing physical constraints operating on the cell's architecture and dynamics, and their consequences in terms of gene and genome structure. If such a correlation exists, it derives from some selection pressure: simple and general physical principles acting at the level of the cell structure are discussed. As a first case in point we see the piling up of planar modules as a stable, entropy-driven, architectural principle that could be at the root of the coupling between the architecture of the cell and the location of genes at specific places in the chromosome. We propose that the specific organization of certain genes whose products have a general tendency to form easily planar modules is a general motor for architectural organization in the bacterial cell. A second mechanism, operating at the transcription level, is described that could account for the efficient building up of complex structures. As an organizing principle we suggest that exploration by biological polymers of the vast space of possible conformation states is constrained by anchoring points. In particular, we suggest that transcription does not always allow the 5'-end of the transcript to go free and explore the many conformations available, but that, in many cases, it remains linked to the transcribing RNA polymerase complex in such a way that loops of RNA, rather than threads with a free end, explore the surrounding medium. In bacteria, extension of the loops throughout the cytoplasm would therefore be mediated by the de novo synthesis of ribosomes in growing cells. Termination of transcription and mRNA turnover would accordingly be expected to be controlled by sequence features at both the 3'- and 5'-ends of the molecule. These concepts are discussed taking into account in vitro analysis of genome sequences and experimental data about cell compartmentalization, mRNA folding and turnover, as well as known structural features of protein and membrane complexes.
Collapse
|
27
|
Abstract
Bacillus subtilis possesses three classes of genes, differing by their codon preference. One class corresponds to prophages or prophage-like elements, indicative of the existence of systematic lateral gene transfer in this organism. The nature of the selection pressure that operates on codon bias is beginning to be understood.
Collapse
|
28
|
Abstract
The completion of the entire 4.2-Mb genome sequence of the gram-positive bacterium Bacillus subtilis has been a milestone for biological studies on this model organism. This paper describes bioinformatics work related to this joint European and Japanese project: methods and strategies for gene annotation and detection of sequencing errors, using an integrated cooperative computer environment (Imagene); construction of a specialized database for data management and a WWW server for data retrieval (SubtiList); DNA sequence analysis, yielding striking results on oligonucleotide bias, repeated sequences, and codon usage, all landmarks of evolutionary events shaping the B. subtilis genome.
Collapse
|
29
|
Global analysis of genomic texts: the distribution of AGCT tetranucleotides in the Escherichia coli and Bacillus subtilis genomes predicts translational frameshifting and ribosomal hopping in several genes. Electrophoresis 1998; 19:515-27. [PMID: 9588797 DOI: 10.1002/elps.1150190411] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Present availability of the genomic text of bacteria allows assignment of biological known functions to many genes (typically, half of the genome's gene content). It is now time to try and predict new unexpected functions, using inductive procedures that allow correlating the content of the genomic text to possible biological functions. We show here that analysis of the genomes of Escherichia coli and Bacillus subtilis for the distribution of AGCT motifs predicts that genes exist for which the mRNA molecule can be translated as several different proteins synthesized after ribosomal frameshifting or hopping. Among these genes we found that several coded for the same function in E. coli and B. subtilis. We analyzed in depth the situation of the infB gene (experimentally known to specify synthesis of several proteins differing in their translation starts), the aceF/pdhC gene, the eno gene, and the rplI gene. In addition, genes specific to E. coli were also studied: ompA, ompFand tolA (predicting epigenetic variation that could help escape infection by phages or colicins).
Collapse
|
30
|
Abstract
Bacillus subtilis is the best-characterized member of the Gram-positive bacteria. Its genome of 4,214,810 base pairs comprises 4,100 protein-coding genes. Of these protein-coding genes, 53% are represented once, while a quarter of the genome corresponds to several gene families that have been greatly expanded by gene duplication, the largest family containing 77 putative ATP-binding transport proteins. In addition, a large proportion of the genetic capacity is devoted to the utilization of a variety of carbon sources, including many plant-derived molecules. The identification of five signal peptidase genes, as well as several genes for components of the secretion apparatus, is important given the capacity of Bacillus strains to secrete large amounts of industrially important enzymes. Many of the genes are involved in the synthesis of secondary metabolites, including antibiotics, that are more typically associated with Streptomyces species. The genome contains at least ten prophages or remnants of prophages, indicating that bacteriophage infection has played an important evolutionary role in horizontal gene transfer, in particular in the propagation of bacterial pathogenesis.
Collapse
|
31
|
The Bacillus subtilis genome from gerBC (311 degrees) to licR (334 degrees). MICROBIOLOGY (READING, ENGLAND) 1997; 143 ( Pt 10):3313-3328. [PMID: 9353933 DOI: 10.1099/00221287-143-10-3313] [Citation(s) in RCA: 25] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
As part of the international project to sequence the Bacillus subtilis genome, the DNA region located between gerBC (311 degrees) and licR (334 degrees) was assigned to the institut Pasteur. In this paper, the cloning and sequencing of 176 kb of DNA and the analysis of the sequence of the entire 271 kb region (6.5% of the B. subtilis chromosome) is described; 273 putative coding sequences were identified. Although the complete genome sequences of seven other organisms (five bacteria, one archaeon and the yeast Saccharomyces cerevisiae) are available in public database, 65 genes from this region of the B. subtilis chromosome encode proteins without significant similarities to other known protein sequences. Among the 208 other genes, 115 have paralogues in the currently known B. subtilis DNA sequences and the products of 178 genes were found to display similarities to protein sequences from public databases for which a function is known. Classification of these genes shows a high proportion of them to be involved in the adaptation to various growth conditions (non-essential cell wall constituents, catabolic and bioenergetic pathways); a small number of the genes are essential or encode anabolic enzymes.
Collapse
|
32
|
Abstract
In the context of the international project aiming at sequencing the whole genome of Bacillus subtilis we have developed NRSub, a non-redundant database of sequences from this organism. Starting from the B.subtilis sequences available in the repository collections we have removed all encountered duplications, then we have added extra annotations to the sequences (e.g. accession numbers for the genes, locations on the genetic map, codon usage index). We have also added cross-references with EMBL/GenBank/DDBJ, MEDLINE, SWISS-PROT and ENZYME databases. NRSub is distributed through anonymous FTP as a text file in EMBL format and as an ACNUC database. It is also possible to access the database through two dedicated World Wide Web servers located in France (http://acnuc.univ-lyon1.fr/nrsub/nrsub.++ +html ) and in Japan (http://ddbjs4h.genes.nig.ac.jp/ ).
Collapse
|
33
|
The European Bacillus subtilis genome sequencing project: current status and accessibility of the data from a new World Wide Web site. MICROBIOLOGY (READING, ENGLAND) 1996; 142 ( Pt 11):2987-91. [PMID: 8969494 DOI: 10.1099/13500872-142-11-2987] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
34
|
Uneven distribution of GATC motifs in the Escherichia coli chromosome, its plasmids and its phages. J Mol Biol 1996; 257:574-85. [PMID: 8648625 DOI: 10.1006/jmbi.1996.0186] [Citation(s) in RCA: 50] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
This work reconsiders the GATC motif distribution in a 1.6 Mb segment of the Escherichia coli genome, compared to its distribution in phages and plasmids. At first sight the distribution of GATC words looks random. But when a realistic model of the chromosome (made of average genes having the same codon usage as in the real chromasome), is used as a theoretical reference, strong biasesare observed. GATC pairs such as GATCNNGATC are under-represented while there is a strong positive selection for motifs separated by 10, 19, 70 and 1100 bp. The last class is the only one present in E. coli parasites. It can be ascribed to the triggering sequences of the long-patch mismatch repair system. The 6 bp class overlaps with the consensus of CAP (catabolite activator protein) and FNR (fumarate/nitrate regulator) binding sites, thus accounting for counter-selection. The other classes, which could be targets for a nucleic acid-binding protein, are almost always present inside protein coding sequences, and are members of clusters of GATC motifs. Analysis of the genes containing these motifs suggests that they correspond to a regulatory process monitoring the shift from anaerobic to aerobic growth conditions. In particular this regulation, closing down transcription of a large number of genes involved in intermediary metabolism would be well suited for the cold and oxygen shift from the mammal's gut to the standard environmental conditions. In this process the methylation status of GATC clusters would be very important for tuning transcription, and a DNA binding protein, probably a member of the cold-shock proteins family would be needed for alleviating the effects mediated by slackening of the pace of methylation during the shift.
Collapse
|
35
|
Abstract
In the context of the international project aimed at sequencing the whole genome of Bacillus subtilis we have developed a non-redundant, fully annotated database of sequences from this organism. Starting from the B.subtilis sequences available in the EMBL, GenBank and DDBJ collections we have removed all encountered duplications and then added extra annotations to the sequences (e.g. accession numbers for the genes, locations on the genetic map, codon usage, etc.) We have also added cross-references to the EMBL, MEDLINE, SWISS-PROT and ENZYME data banks. The present system results from merging of the NRSub and SubtiList databases and the sequence contigs used in the two systems are identical. NRSub is distributed as a flatfile in EMBL format (which is supported by most sequence analysis software packages) and as an ACNUC database, while SubtiList is distributed as a relational database under 4th Dimension. It is possible to access the data through two dedicated World Wide Web servers located in France and Japan.
Collapse
|
36
|
Anaerobic transcription activation in Bacillus subtilis: identification of distinct FNR-dependent and -independent regulatory mechanisms. EMBO J 1995; 14:5984-94. [PMID: 8846791 PMCID: PMC394718 DOI: 10.1002/j.1460-2075.1995.tb00287.x] [Citation(s) in RCA: 74] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Bacillus subtilis is able to grow anaerobically using alternative electron acceptors, including nitrate or fumarate. We characterized an operon encoding the dissimilatory nitrate reductase subunits homologous to the Escherichia coli narGHJI operon and the narK gene encoding a protein with nitrite extrusion activity. Downstream from narK and co-transcribed with it a gene (fnr) encoding a protein homologous to E.coli FNR was found. Disruption of fnr abolished both nitrate and fumarate utilization as electron acceptors and anaerobic induction of narK. Four putative FNR binding sites were found in B.subtilis sequences. The consensus sequence, centred at position -41.5, is identical to the consensus for the DNA site for E.coli CAP. Bs-FNR contained a four cysteine residue cluster at its C-terminal end. This is in contrast to Ec-FNR, where a similar cluster is present at the N-terminal end. It is possible that oxygen modulates the activity of both activators by a similar mechanism involving iron. Unlike in E.coli, where fnr expression is weakly repressed by anaerobiosis, fnr gene expression in B.subtilis is strongly activated by anaerobiosis. We have identified in the narK-fnr intergenic region a promotor activated by anaerobiosis independently of FNR. Thus induction of genes involved in anaerobic respiration requires in B.subtilis at least two levels of regulation: activation of fnr transcription and activation of FNR to induce transcription of FNR-dependent promoters.
Collapse
|
37
|
Abstract
Analysis of the huge volume of data generated by large scale sequencing projects requires the construction of new, sophisticated computer systems. These systems should be able to manage the biological data as well as the results of their analysis. They should also help the user to choose the most appropriate methods, and to string them together in order to solve a global analysis task. In this paper we present the prototype of a software system providing an environment for the analysis of large-scale sequence data. As a first step toward this end, this environment has been put to the test within the Bacillus subtilis genome sequencing project. This system integrates both the descriptive knowledge of the entities involved (genes, regulatory signals and the like) and the methodological knowledge comprising an extensible set of analytical methods. A knowledge representation based on two existing object-oriented models is used to implement this integrated system. In addition, the present prototype provides a suitable user interface both for displaying simultaneously the results generated by several methods and for interacting with the objects. We present in this paper the analysis of a B. subtilis genome fragment, present in data libraries but not annotated. Annotation of the genes present in the fragment allowed us to combine the results of several methods used for predicting coding sequences, and to characterize it as comprising a cryptic phage, the skin element. Comparison between the annotation of the skin element and a standard region of the chromosome indicated that local features of the nucleotide sequence could discriminate between phage and non-phage DNA sequence.
Collapse
|
38
|
SubtiList: a relational database for the Bacillus subtilis genome. MICROBIOLOGY (READING, ENGLAND) 1995; 141 ( Pt 2):261-8. [PMID: 7704253 DOI: 10.1099/13500872-141-2-261] [Citation(s) in RCA: 140] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
In the framework of the international collaborative project aiming to sequence the whole Bacillus subtilis chromosome, we have created a relational database for managing and analysing information associated with the molecular genetics of this bacterium: SubtiList. It allows recovery of non-redundant DNA sequences of the B. subtilis genome, as well as related information, i.e. genes, proteins, etc. A logical structure has been designed with appropriate links between the different objects, and a set of procedures has been implemented for data updating and management. The database is organized around a core constituted by all known contigs of B. subtilis, i.e. sets of non-redundant sequences created from original entries in the EMBL data library. A user-friendly interface has been developed to make the database easy to consult. Sequence analysis tools have been integrated into the database, such as a program for rapid similarity searching of protein data banks, and a powerful DNA pattern searching program. Thanks to the consistency of SubtiList, we have performed a codon usage analysis by Factorial Correspondence Analysis, and a study of the distribution of the isoelectric points of known proteins of B. subtilis. The SubtiList database is available through anonymous ftp (address 'ftp.pasteur.fr' or IP number 157.99.64.12, directory '/pub/GenomeDB/SubtiList').
Collapse
|
39
|
Bacillus subtilis genome project: cloning and sequencing of the 97 kb region from 325 degrees to 333 degrees. Mol Microbiol 1993; 10:371-84. [PMID: 7934828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In the framework of the European project aimed at the sequencing of the Bacillus subtilis genome the DNA region located between gerB (314 degrees) and sacXY (333 degrees) was assigned to the Institut Pasteur. In this paper we describe the cloning and sequencing of a segment of 97 kb of contiguous DNA. Ninety-two open reading frames were predicted to encode putative proteins among which only forty-two were found to display significant similarities to known proteins present in databanks, e.g. amino acid permeases, proteins involved in cell wall or antibiotic biosynthesis, various regulatory proteins, proteins of several dehydrogenase families and enzymes II of the phosphotransferase system involved in sugar transport. Additional experiments led to the identification of the products of new B. subtilis genes, e.g. galactokinase and an operon involved in thiamine biosynthesis.
Collapse
|
40
|
Abstract
In order to assess the feasibility of semi-automatic procedures for large genome sequencing, a fragment of 9.4 kb of Escherichia coli chromosomal DNA isolated at random was sequenced. It was found to map at 30 min on the chromosome map and to harbour two insertion sequences (IS2 and IS30) as well as several putative coding sequences which had no feature in common with known proteins.
Collapse
|