1
|
Sharaf A, Nesengani LT, Hayah I, Kuja JO, Mdyogolo S, Omotoriogun TC, Odogwu BA, Beedessee G, Smith RM, Barakat A, Moila AM, El Hamouchi A, Benkahla A, Boukteb A, Elmouhtadi A, Mafwila AL, Abushady AM, Elsherif AK, Ahmed B, Wairuri C, Ndiribe CC, Ebuzome C, Kinnear CJ, Ndlovu DF, Iraqi D, El Fahime E, Assefa E, Ouardi F, Belharfi FZ, Tmimi FZ, Markey FB, Radouani F, Zeukeng F, Mvumbi GL, Ganesan H, Hanachi M, Nigussie H, Charoute H, Benamri I, Mkedder I, Haddadi I, Meftah-Kadmiri I, Mubiru JF, Domelevo Entfellner JBK, Rokani JB, Ogwang J, Daiga JB, Omumbo J, Ideozu JE, Errafii K, Labuschagne K, Komi KK, Tonfack LB, Hadjeras L, Ramantswana M, Chaisi M, Botes MW, Kilian M, Kvas M, Melloul M, Chaouch M, Khyatti M, Abdo M, Phasha-Muchemenye M, Hijri M, Mediouni MR, Hassan MA, Piro M, Mwale M, Maaloum M, Mavhunga M, Olivier NA, Aminou O, Arbani O, Souiai O, Djocgoue PF, Mentag R, Zipfel RD, Tata RB, Megnekou R, Muzemil S, Paez S, Salifu SP, Kagame SP, Selka S, Edwards S, Gaouar SBS, Reda SRA, Fellahi S, Khayi S, Ayed S, Madisha T, Sahil T, Udensi OU, Ras V, Ezebuiro V, Duru VC, David X, Geberemichael Y, Tchiechoua YH, et alSharaf A, Nesengani LT, Hayah I, Kuja JO, Mdyogolo S, Omotoriogun TC, Odogwu BA, Beedessee G, Smith RM, Barakat A, Moila AM, El Hamouchi A, Benkahla A, Boukteb A, Elmouhtadi A, Mafwila AL, Abushady AM, Elsherif AK, Ahmed B, Wairuri C, Ndiribe CC, Ebuzome C, Kinnear CJ, Ndlovu DF, Iraqi D, El Fahime E, Assefa E, Ouardi F, Belharfi FZ, Tmimi FZ, Markey FB, Radouani F, Zeukeng F, Mvumbi GL, Ganesan H, Hanachi M, Nigussie H, Charoute H, Benamri I, Mkedder I, Haddadi I, Meftah-Kadmiri I, Mubiru JF, Domelevo Entfellner JBK, Rokani JB, Ogwang J, Daiga JB, Omumbo J, Ideozu JE, Errafii K, Labuschagne K, Komi KK, Tonfack LB, Hadjeras L, Ramantswana M, Chaisi M, Botes MW, Kilian M, Kvas M, Melloul M, Chaouch M, Khyatti M, Abdo M, Phasha-Muchemenye M, Hijri M, Mediouni MR, Hassan MA, Piro M, Mwale M, Maaloum M, Mavhunga M, Olivier NA, Aminou O, Arbani O, Souiai O, Djocgoue PF, Mentag R, Zipfel RD, Tata RB, Megnekou R, Muzemil S, Paez S, Salifu SP, Kagame SP, Selka S, Edwards S, Gaouar SBS, Reda SRA, Fellahi S, Khayi S, Ayed S, Madisha T, Sahil T, Udensi OU, Ras V, Ezebuiro V, Duru VC, David X, Geberemichael Y, Tchiechoua YH, Mungloo-Dilmohamud Z, Chen Z, Happi C, Kariuki T, Ziyomo C, Djikeng A, Badaoui B, Mapholi N, Muigai A, Osuji JO, Ebenezer TE. Establishing African genomics and bioinformatics programs through annual regional workshops. Nat Genet 2024; 56:1556-1565. [PMID: 38977855 DOI: 10.1038/s41588-024-01807-6] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 05/22/2024] [Indexed: 07/10/2024]
Abstract
The African BioGenome Project (AfricaBP) Open Institute for Genomics and Bioinformatics aims to overcome barriers to capacity building through its distributed African regional workshops and prioritizes the exchange of grassroots knowledge and innovation in biodiversity genomics and bioinformatics. In 2023, we implemented 28 workshops on biodiversity genomics and bioinformatics, covering 11 African countries across the 5 African geographical regions. These regional workshops trained 408 African scientists in hands-on molecular biology, genomics and bioinformatics techniques as well as the ethical, legal and social issues associated with acquiring genetic resources. Here, we discuss the implementation of transformative strategies, such as expanding the regional workshop model of AfricaBP to involve multiple countries, institutions and partners, including the proposed creation of an African digital database with sequence information relating to both biodiversity and agriculture. This will ultimately help create a critical mass of skilled genomics and bioinformatics scientists across Africa.
Collapse
Affiliation(s)
- Abdoallah Sharaf
- SequAna Core Facility, Department of Biology, University of Konstanz, Konstanz, Germany
- Genetics Department, Faculty of Agriculture, Ain Shams University, Cairo, Egypt
| | - Lucky Tendani Nesengani
- College of Agriculture and Environmental Sciences, University of South Africa, Florida, South Africa
| | - Ichrak Hayah
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | | | - Sinebongo Mdyogolo
- College of Agriculture and Environmental Sciences, University of South Africa, Florida, South Africa
| | - Taiwo Crossby Omotoriogun
- Department of Biological Sciences, Elizade University, Ilara-Mokin, Nigeria
- A. P. Leventis Ornithological Research Institute, University of Jos, Jos, Nigeria
| | - Blessing Adanta Odogwu
- Regional Centre for Biotechnology and Bioresources Research, University of Port Harcourt, Port Harcourt, Nigeria
- South-South Zonal Centre of Excellence, National Biotechnology Development Agency, Port Harcourt, Nigeria
| | - Girish Beedessee
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle-upon-Tyne, UK
| | - Rae Marvin Smith
- College of Agriculture and Environmental Sciences, University of South Africa, Florida, South Africa
| | | | | | - Adil El Hamouchi
- Research Department, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Alia Benkahla
- Laboratory of Bioinformatics, Biomathematics and Biostatistics-LR16IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Amal Boukteb
- Field Crops Laboratory, National Institute of Agricultural Research of Tunisia (INRAT), University of Carthage, Tunis, Tunisia
| | - Amine Elmouhtadi
- Biotechnology Research Unit, Regional Center of Agricultural Research of Rabat, National Institute of Agricultural Research, Rabat, Morocco
| | - Antoine Lusala Mafwila
- Laboratory of Molecular Biology, Department of Basic Sciences, University of Kinshasa, Kinshasa, Democratic Republic of Congo
| | - Asmaa Mohammed Abushady
- Genetics Department, Faculty of Agriculture, Ain Shams University, Cairo, Egypt
- Biotechnology School, Nile University, Giza, Egypt
| | | | - Bulbul Ahmed
- African Genome Center, University Mohammed VI Polytechnic (UM6P), Ben Guerir, Morocco
| | | | | | | | - Craig J Kinnear
- South African Medical Research Council Genomics Platform, Cape Town, South Africa
| | | | - Driss Iraqi
- Biotechnology Research Unit, Regional Center of Agricultural Research of Rabat, National Institute of Agricultural Research, Rabat, Morocco
| | | | - Ermias Assefa
- Bio and Emerging Technology Institute, Addis Ababa, Ethiopia
| | - Faissal Ouardi
- Faculty of Sciences, Mohammed V University, Rabat, Morocco
| | - Fatima Zohra Belharfi
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | | | - Fatu Badiane Markey
- Science for Africa Foundation, Nairobi, Kenya
- Rutgers University-Newark, Newark, NJ, USA
| | - Fouzia Radouani
- Research Department, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Francis Zeukeng
- Biotechnology Centre, University of Yaoundé 1, Yaoundé, Cameroon
| | - Georges Lelo Mvumbi
- Laboratory of Molecular Biology, Department of Basic Sciences, University of Kinshasa, Kinshasa, Democratic Republic of Congo
| | | | - Mariem Hanachi
- Laboratory of Bioinformatics, Biomathematics and Biostatistics-LR16IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Helen Nigussie
- Department of Microbial Cellular and Molecular Biology, Addis Ababa University, Addis Ababa, Ethiopia
| | - Hicham Charoute
- Research Department, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Ichrak Benamri
- Research Department, Institut Pasteur du Maroc, Casablanca, Morocco
| | - Ikram Mkedder
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | - Imane Haddadi
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | - Issam Meftah-Kadmiri
- Plant and Microbial Biotechnology Center, Moroccan Foundation for Advanced Science, Innovation and Research, University Mohammed VI Polytechnic, Ben Guerir, Morocco
| | - Jackson Franco Mubiru
- Department of Breeding and Reproduction, National Animal Genetic Resources Centre and Data Bank, Entebbe, Uganda
| | | | - Joan Bayowa Rokani
- Department of Breeding and Reproduction, National Animal Genetic Resources Centre and Data Bank, Entebbe, Uganda
| | - Joel Ogwang
- Department of Breeding and Reproduction, National Animal Genetic Resources Centre and Data Bank, Entebbe, Uganda
| | | | - Judy Omumbo
- Science for Africa Foundation, Nairobi, Kenya
| | | | - Khaoula Errafii
- African Genome Center, University Mohammed VI Polytechnic (UM6P), Ben Guerir, Morocco
| | - Kim Labuschagne
- Foundational Biodiversity Science, South African National Biodiversity Institute, Pretoria, South Africa
| | - Komi Koukoura Komi
- Laboratoire des Sciences Biomédicales, Alimentaires et de Santé Environnementale (LaSBASE), Département des Analyses Biomédicales (AMB), Ecole Supérieure des Techniques Biologiques et Alimentaires (ESTBA), Université de Lomé, Lomé, Togo
| | | | | | | | - Mamohale Chaisi
- Foundational Biodiversity Science, South African National Biodiversity Institute, Pretoria, South Africa
| | - Marietjie W Botes
- Division of Medicine, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | | | - Marija Kvas
- Separations (Pty) Ltd, Johannesburg, South Africa
| | - Marouane Melloul
- National Center for Scientific and Technical Research, Rabat, Morocco
| | - Melek Chaouch
- Laboratory of Bioinformatics, Biomathematics and Biostatistics-LR16IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Meriem Khyatti
- Research Department, Institut Pasteur du Maroc, Casablanca, Morocco
| | | | | | - Mohamed Hijri
- African Genome Center, University Mohammed VI Polytechnic (UM6P), Ben Guerir, Morocco
| | - Mohammed Rida Mediouni
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | | | - Mohammed Piro
- Veterinary Genetic Analysis Laboratory, Hassan II Agronomy and Veterinary Institute (IAV), Rabat, Morocco
| | - Monica Mwale
- Foundational Biodiversity Science, South African National Biodiversity Institute, Pretoria, South Africa
| | | | - Mudzuli Mavhunga
- Foundational Biodiversity Science, South African National Biodiversity Institute, Pretoria, South Africa
| | - Nicholas Abraham Olivier
- Department of Plant and Soil Sciences, University of Pretoria, Pretoria, South Africa
- Forestry and Agricultural Biotechnology Institute, University of Pretoria, Pretoria, South Africa
| | - Oumaima Aminou
- Veterinary Genetic Analysis Laboratory, Hassan II Agronomy and Veterinary Institute (IAV), Rabat, Morocco
| | - Oumayma Arbani
- Department of Veterinary Pathology and Public Health, Hassan II Agronomy and Veterinary Institute (IAV), Rabat, Morocco
| | - Oussema Souiai
- Laboratory of Bioinformatics, Biomathematics and Biostatistics-LR16IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | | | - Rachid Mentag
- Biotechnology Research Unit, Regional Center of Agricultural Research of Rabat, National Institute of Agricultural Research, Rabat, Morocco
| | - Renate Dorothea Zipfel
- Forestry and Agricultural Biotechnology Institute, University of Pretoria, Pretoria, South Africa
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
| | | | - Rosette Megnekou
- Biotechnology Centre, University of Yaoundé 1, Yaoundé, Cameroon
| | | | - Sadye Paez
- Department of Neurogenetics of Language, Rockefeller University, New York, NY, USA
| | - Samson Pandam Salifu
- Faculty of Bioscience, College of Science, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana
| | | | - Sarra Selka
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | | | - Semir Bechir Suheil Gaouar
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | | | - Siham Fellahi
- Veterinary Genetic Analysis Laboratory, Hassan II Agronomy and Veterinary Institute (IAV), Rabat, Morocco
| | - Slimane Khayi
- Biotechnology Research Unit, Regional Center of Agricultural Research of Rabat, National Institute of Agricultural Research, Rabat, Morocco
| | - Soumia Ayed
- Applied Genetics in Agriculture, Ecology and Public Health Laboratory, University of Abou Bekr Belkaid Tlemcen, Tlemcen, Algeria
| | - Thabang Madisha
- Agricultural Research Council, Biotechnology Platform, Pretoria, South Africa
| | | | | | - Verena Ras
- University of Cape Town, Cape Town, South Africa
| | - Victor Ezebuiro
- Regional Centre for Biotechnology and Bioresources Research, University of Port Harcourt, Port Harcourt, Nigeria
- South-South Zonal Centre of Excellence, National Biotechnology Development Agency, Port Harcourt, Nigeria
| | - Vincent C Duru
- Department of Parasitology and Entomology, Nnamdi Azikiwe University, Awka, Nigeria
| | | | | | - Yves H Tchiechoua
- Department of Biology, Chemistry and Pharmacy, Free University Berlin, Berlin, Germany
| | | | | | - Christian Happi
- African Centre of Excellence for Genomics of Infectious Diseases, Redeemer's University, Ede, Nigeria
| | | | | | - Appolinaire Djikeng
- College of Agriculture and Environmental Sciences, University of South Africa, Florida, South Africa
- International Livestock Research Institute, Nairobi, Kenya
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Edinburgh, UK
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco.
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laâyoune, Morocco.
| | - Ntanganedzeni Mapholi
- College of Agriculture and Environmental Sciences, University of South Africa, Florida, South Africa.
| | - Anne Muigai
- National Defence University-Kenya, Nakuru, Kenya.
- Jomo Kenyatta University of Agriculture and Technology, Juja, Kenya.
| | - Julian O Osuji
- Regional Centre for Biotechnology and Bioresources Research, University of Port Harcourt, Port Harcourt, Nigeria.
- South-South Zonal Centre of Excellence, National Biotechnology Development Agency, Port Harcourt, Nigeria.
- Department of Plant Science and Biotechnology, University of Port Harcourt, Port Harcourt, Nigeria.
| | - ThankGod Echezona Ebenezer
- Early Cancer Institute, Department of Oncology, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
| |
Collapse
|
2
|
Nicolosi Gelis MM, Canino A, Bouchez A, Domaizon I, Laplace-Treyture C, Rimet F, Alric B. Assessing the relevance of DNA metabarcoding compared to morphological identification for lake phytoplankton monitoring. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 914:169774. [PMID: 38215838 DOI: 10.1016/j.scitotenv.2023.169774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 12/08/2023] [Accepted: 12/28/2023] [Indexed: 01/14/2024]
Abstract
Phytoplankton is a key biological group used to assess the ecological status of lakes. The classical monitoring approach relies on microscopic identification and counting of phytoplankton species, which is time-consuming and requires high taxonomic expertise. High-throughput sequencing, combined with metabarcoding, has recently demonstrated its potential as an alternative approach for plankton surveys. Several studies have confirmed the relevance of the diatom metabarcoding approach to calculate biotic indices based on species ecology. However, phytoplankton communities have not yet benefited from such validation. Here, by comparing the results obtained with the two methods (molecular and microscopic counting), we evaluated the relevance of metabarcoding approach for phytoplankton monitoring by considering different metrics: alpha diversity, taxonomic composition, community structure and a phytoplankton biotic index used to assess the trophic level of lakes. For this purpose, 55 samples were collected in four large alpine lakes (Aiguebelette, Annecy, Bourget, Geneva) during the year 2021. For each sample, a metabarcoding analysis based on two genetic markers (16S and 23S rRNA) was performed, in addition to the microscopic count. Regarding the trophic level of lakes, significant differences were found between index values obtained with the two approaches. The main hypothesis to explain these differences comes from the incompleteness, particularly at the species level, of the barcode reference library for the two genetic markers. It is therefore necessary to complete reference libraries for using such species-based biotic indices with metabarcoding data. Besides this, species richness and diversity were higher in the molecular inventories than in the microscopic ones. Moreover, despite differences in taxonomic composition of the floristic lists obtained by the two approaches, their community structures were similar. These results support the possibility of using metabarcoding for phytoplankton monitoring but in a different way. We suggest exploring alternative approaches to index development, such as a taxonomy-free approach.
Collapse
Affiliation(s)
- Maria Mercedes Nicolosi Gelis
- Instituto de Limnología Dr. Raúl A. Ringuelet, CONICET-UNLP, Argentina; UMR CARRTEL, INRAE, Université Savoie Mont Blanc, 75bis av. De Corzent - CS 50511, FR - 74203 Thonon-les-Bains cedex, France; Pole R&D ECLA Ecosystèmes Lacustres, France
| | - Alexis Canino
- UMR CARRTEL, INRAE, Université Savoie Mont Blanc, 75bis av. De Corzent - CS 50511, FR - 74203 Thonon-les-Bains cedex, France; Pole R&D ECLA Ecosystèmes Lacustres, France
| | - Agnès Bouchez
- UMR CARRTEL, INRAE, Université Savoie Mont Blanc, 75bis av. De Corzent - CS 50511, FR - 74203 Thonon-les-Bains cedex, France; Pole R&D ECLA Ecosystèmes Lacustres, France
| | - Isabelle Domaizon
- UMR CARRTEL, INRAE, Université Savoie Mont Blanc, 75bis av. De Corzent - CS 50511, FR - 74203 Thonon-les-Bains cedex, France; Pole R&D ECLA Ecosystèmes Lacustres, France
| | - Christophe Laplace-Treyture
- Pole R&D ECLA Ecosystèmes Lacustres, France; UR EABX, INRAE, 50 avenue de Verdun, FR - 33612 Cestas cedex, France
| | - Frédéric Rimet
- UMR CARRTEL, INRAE, Université Savoie Mont Blanc, 75bis av. De Corzent - CS 50511, FR - 74203 Thonon-les-Bains cedex, France; Pole R&D ECLA Ecosystèmes Lacustres, France
| | - Benjamin Alric
- UMR CARRTEL, INRAE, Université Savoie Mont Blanc, 75bis av. De Corzent - CS 50511, FR - 74203 Thonon-les-Bains cedex, France; Pole R&D ECLA Ecosystèmes Lacustres, France.
| |
Collapse
|
3
|
Paula DP, Barros SKA, Pitta RM, Barreto MR, Togawa RC, Andow DA. Metabarcoding versus mapping unassembled shotgun reads for identification of prey consumed by arthropod epigeal predators. Gigascience 2022; 11:giac020. [PMID: 35333301 PMCID: PMC8952265 DOI: 10.1093/gigascience/giac020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 12/07/2021] [Accepted: 02/09/2022] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND A central challenge of DNA gut content analysis is to identify prey in a highly degraded DNA community. In this study, we evaluated prey detection using metabarcoding and a method of mapping unassembled shotgun reads (Lazaro). RESULTS In a mock prey community, metabarcoding did not detect any prey, probably owing to primer choice and/or preferential predator DNA amplification, while Lazaro detected prey with accuracy 43-71%. Gut content analysis of field-collected arthropod epigeal predators (3 ants, 1 dermapteran, and 1 carabid) from agricultural habitats in Brazil (27 samples, 46-273 individuals per sample) revealed that 64% of the prey species detections by either method were not confirmed by melting curve analysis and 87% of the true prey were detected in common. We hypothesized that Lazaro would detect fewer true- and false-positive and more false-negative prey with greater taxonomic resolution than metabarcoding but found that the methods were similar in sensitivity, specificity, false discovery rate, false omission rate, and accuracy. There was a positive correlation between the relative prey DNA concentration in the samples and the number of prey reads detected by Lazaro, while this was inconsistent for metabarcoding. CONCLUSIONS Metabarcoding and Lazaro had similar, but partially complementary, detection of prey in arthropod predator guts. However, while Lazaro was almost 2× more expensive, the number of reads was related to the amount of prey DNA, suggesting that Lazaro may provide quantitative prey information while metabarcoding did not.
Collapse
Affiliation(s)
- Débora Pires Paula
- Embrapa Genetic Resources and Biotechnology, Brasília-DF, 70770-917, Brazil
| | | | | | | | | | - David A Andow
- Department of Entomology, University of Minnesota, MN, 55108, St. Paul, USA
| |
Collapse
|
4
|
Cao H, Xu D, Zhang T, Ren Q, Xiang L, Ning C, Zhang Y, Gao R. Comprehensive and functional analyses reveal the genomic diversity and potential toxicity of Microcystis. HARMFUL ALGAE 2022; 113:102186. [PMID: 35287927 DOI: 10.1016/j.hal.2022.102186] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 01/11/2022] [Accepted: 01/12/2022] [Indexed: 06/14/2023]
Abstract
Microcystis is a cyanobacteria that is widely distributed across the world. It has attracted great attention because it produces the hepatotoxin microcystin (MC) that can inhibit eukaryotic protein phosphatases and pose a great risk to animal and human health. Due to the high diversity of morphospecies and genomes, it is still difficult to classify Microcystis species. In this study, we investigated the pangenome of 23 Microcystis strains to detect the genetic diversity and evolutionary dynamics. Microcystis revealed an open pangenome containing 22,009 gene families and exhibited different functional constraints. The core-genome phylogenetic analysis accurately differentiated the toxic and nontoxic strains and could be used as a taxonomic standard at the genetic level. We also investigated the functions of HGT events, of which were mostly conferred from cyanobacteria and closely related species. In order to detect the potential toxicity of Microcystis, we searched and characterized MC biosynthetic gene clusters and other secondary metabolite gene clusters. Our work provides insights into the genetic diversity, evolutionary dynamics, and potential toxicity of Microcystis, which could benefit the species classification and development of new methods for drinking water quality control and management of bloom formation in the future.
Collapse
Affiliation(s)
- Hengchun Cao
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, Shandong, China
| | - Da Xu
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, Shandong, China
| | - Tiantian Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, Shandong, China
| | - Qiufang Ren
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, Shandong, China
| | - Li Xiang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, Shandong, China
| | - Chunhui Ning
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, Shandong, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, Shandong, China.
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, Jinan 250061, Shandong, China.
| |
Collapse
|
5
|
Kamal S, Ripon SH, Dey N, Ashour AS, Santhi V. A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 131:191-206. [PMID: 27265059 DOI: 10.1016/j.cmpb.2016.04.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 03/18/2016] [Accepted: 04/06/2016] [Indexed: 06/05/2023]
Abstract
BACKGROUND In the age of information superhighway, big data play a significant role in information processing, extractions, retrieving and management. In computational biology, the continuous challenge is to manage the biological data. Data mining techniques are sometimes imperfect for new space and time requirements. Thus, it is critical to process massive amounts of data to retrieve knowledge. The existing software and automated tools to handle big data sets are not sufficient. As a result, an expandable mining technique that enfolds the large storage and processing capability of distributed or parallel processing platforms is essential. METHOD In this analysis, a contemporary distributed clustering methodology for imbalance data reduction using k-nearest neighbor (K-NN) classification approach has been introduced. The pivotal objective of this work is to illustrate real training data sets with reduced amount of elements or instances. These reduced amounts of data sets will ensure faster data classification and standard storage management with less sensitivity. However, general data reduction methods cannot manage very big data sets. To minimize these difficulties, a MapReduce-oriented framework is designed using various clusters of automated contents, comprising multiple algorithmic approaches. RESULTS To test the proposed approach, a real DNA (deoxyribonucleic acid) dataset that consists of 90 million pairs has been used. The proposed model reduces the imbalance data sets from large-scale data sets without loss of its accuracy. CONCLUSIONS The obtained results depict that MapReduce based K-NN classifier provided accurate results for big data of DNA.
Collapse
Affiliation(s)
- Sarwar Kamal
- Computer Science and Engineering, East West University, Dhaka, Bangladesh
| | | | - Nilanjan Dey
- Techno India Institute of Technology, Kolkata, India
| | - Amira S Ashour
- Department of Electronics and Electrical Communications Engineering, Faculty of Engineering, Tanta University, Tanta, Egypt.
| | - V Santhi
- School of Computing Science and Engineering, VIT University, Vellore, Tamil Nadu, India
| |
Collapse
|
6
|
Degradation of 3-chloropropionic acid (3CP) byPseudomonas sp. B6P isolated from a rice paddy field. ANN MICROBIOL 2009. [DOI: 10.1007/bf03175129] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
7
|
Thasif S, Hamdan S, Huyop F. Degradation of D,L-2-chloropropionic Acid by Bacterial Dehalogenases that Shows Stereospecificity and its Partial Enzymatic Characteristics. ACTA ACUST UNITED AC 2009. [DOI: 10.3923/biotech.2009.264.269] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
8
|
Reeves GA, Talavera D, Thornton JM. Genome and proteome annotation: organization, interpretation and integration. J R Soc Interface 2009; 6:129-47. [PMID: 19019817 DOI: 10.1098/rsif.2008.0341] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Recent years have seen a huge increase in the generation of genomic and proteomic data. This has been due to improvements in current biological methodologies, the development of new experimental techniques and the use of computers as support tools. All these raw data are useless if they cannot be properly analysed, annotated, stored and displayed. Consequently, a vast number of resources have been created to present the data to the wider community. Annotation tools and databases provide the means to disseminate these data and to comprehend their biological importance. This review examines the various aspects of annotation: type, methodology and availability. Moreover, it puts a special interest on novel annotation fields, such as that of phenotypes, and highlights the recent efforts focused on the integrating annotations.
Collapse
Affiliation(s)
- Gabrielle A Reeves
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
9
|
Wolfsberg TG, Madden TL. Sequence similarity searching using the BLAST family of programs. ACTA ACUST UNITED AC 2008; Chapter 19:Unit 19.3. [PMID: 18265177 DOI: 10.1002/0471142727.mb1903s46] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Database sequence similarity searching is carried out thousands of times each day by researchers worldwide and has become a very valuable tool. Over the years, a number of algorithms have been implemented to facilitate database searching. The BLAST (Basic Local Alignment Research Tool) family of sequence similarity search programs allows searches to be done quickly and easily, but with sensitive, yet rigorous statistical expectations. In this unit, which is a completely new version of its predecessor of the same title, the user learns how to access the databases, determine the correct searching strategies, and apply examples of BLAST searches to his or her own data.
Collapse
Affiliation(s)
- T G Wolfsberg
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, Maryland, USA
| | | |
Collapse
|
10
|
Nordle AKL, Rios P, Gaulton A, Pulido R, Attwood TK, Tabernero L. Functional assignment of MAPK phosphatase domains. Proteins 2007; 69:19-31. [PMID: 17596826 DOI: 10.1002/prot.21477] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Mitogen-activated protein kinase (MAPK) pathways are well conserved in most organisms, from yeast to humans. The principal components of these pathways are MAP kinases whose activity is regulated by phosphorylation, implicating various MAPK protein effectors-in particular, protein phosphatases that inactivate MAPKs by dephosphorylation. The molecular basis of binding specificity of such regulatory phosphatases to MAPKs is poorly understood. To try to pinpoint potential functional regions within the sequences and to help identify new family members, we have applied a multimotif pattern-recognition approach to characterize two MAPK phosphatase subfamilies (tyrosine-specific and dual specificity) that are crucial in the regulation of MAPKs. We built "fingerprints" for these two subfamilies that are unique to, and highly discriminatory for, each group of proteins. The fingerprints were used in a genome-wide screen, identifying more than 80 MAPK phosphatase domains, several of which were in partial sequences or unclassified proteins. We confirmed experimentally that one predicted MAPK phosphatase orthologue in Xenopus binds to ERK1/2, suggesting a role in MAPK signaling and thus supporting our functional predictions. Further analysis, mapping the fingerprints on the three-dimensional structure of MAPK phosphatases, revealed that some of the fingerprint motifs reside in the N-terminal noncatalytic regions coinciding with reported MAPK binding sites, while others lie within the catalytic phosphatase domain. These results also suggest the presence of putative allosteric sites in the catalytic region for modulation of protein-protein interactions, and provide a framework for future experimental validation.
Collapse
Affiliation(s)
- Anna K L Nordle
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | | | | | | | | | | |
Collapse
|
11
|
Oberacher H, Niederstätter H, Casetta B, Parson W. Detection of DNA Sequence Variations in Homo- and Heterozygous Samples via Molecular Mass Measurements by Electrospray Ionization Time-of-Flight Mass Spectrometry. Anal Chem 2005; 77:4999-5008. [PMID: 16053315 DOI: 10.1021/ac050399f] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The potential of ion-pair reversed-phase high-performance liquid chromatography on-line hyphenated to electrospray ionization time-of-flight mass spectrometry for the characterization of polymerase chain reaction (PCR) amplified nucleic acids was evaluated. For that purpose, a "SNP toolbox" was constructed by cloning and PCR-mediated site-directed in vitro mutagenesis at nucleotide position (ntp) 16,519 of a sequence-verified fragment of the human mitochondrial genome (ntps 15,900-599). Confirmatory sequencing demonstrated that within the sequences of the clones one and the same base was mutated to all other bases. Using these clones or equimolar mixtures of these clones as PCR templates, 51-401-bp-long amplicons were generated, which were used to determine the upper size limits of PCR products for the unequivocal detection of sequence variations in homo- and heterozygous samples. Based on the high mass spectrometric performance of the applied time-of-flight mass spectrometer, the unequivocal genotyping of all kinds of single base exchanges in PCR amplicons from heterozygous samples with lengths up to 254 base pairs (bp) was demonstrated. Considering homozygous samples, the successful genotyping of single base substitutions in up to 401-bp-long PCR products was possible. Consequently, the described hyphenated technique represents one of the most powerful mass spectrometric genotyping assays available today.
Collapse
Affiliation(s)
- Herbert Oberacher
- Institute of Legal Medicine, Innsbruck Medical University, 6020 Innsbruck, Austria.
| | | | | | | |
Collapse
|
12
|
Hill KE, Weightman AJ. Horizontal transfer of dehalogenase genes on IncP1beta plasmids during bacterial adaptation to degrade alpha-halocarboxylic acids. FEMS Microbiol Ecol 2003; 45:273-82. [PMID: 19719596 DOI: 10.1016/s0168-6496(03)00158-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The diversity of bacterial alpha-halocarboxylic acid (alphaHA) dehalogenases from a polluted soil was investigated. Polymerase chain reaction (PCR) primers designed to amplify group I and group II dehalogenase (deh) gene sequences were used to screen bacterial isolates, nine beta-Proteobacteria and one gamma-Proteobacterium, from soil enrichments. Primers successfully amplified deh sequences from all 10 alphaHA-utilising isolates. Bacteria isolated at 15 or 30 degrees C on chloroacetic acid or 2-chloropropionic acid from the same polluted soil were shown to contain up to four plasmids, some of these common between isolates. Analysis of deletion mutants and Southern hybridisation showed that each isolate contained an apparently identical IncP1beta plasmid c. 80 kb in size, carrying group I deh genes in addition to an associated insertion sequence element. Moreover, an identical conjugative catabolic plasmid was isolated exogenously in several transconjugants independently selected from biparental matings between Ralstonia eutropha JMP222 and enrichment samples. PCR cloning and sequencing of deh genes directly from enrichment cultures inoculated with the same soil revealed that an identical deh gene was present in both primary, secondary and tertiary enrichment cultures, although this deh could not be amplified directly from soil. Two alphaHA-utilising bacteria isolated at lower temperature were found also to contain group II deh genes. Transfer of the deh catabolic phenotype to R. eutropha strain JMP222 occurred at high frequencies for four strains tested, a result that was consistent with assignment of the plasmids to the IncP1 incompatibility group. The promiscuous nature and broad host range of IncP plasmids make them likely to be involved in horizontal gene transfer during adaptation of bacteria to degrade organohalogens.
Collapse
Affiliation(s)
- Katja E Hill
- School of Biosciences, Cardiff University, P.O. Box 915, Cardiff CF10 3TL, UK.
| | | |
Collapse
|
13
|
Möller S, Schroeder M, Apweiler R. Consistent integration of non-reliable heterogeneous information resources applied to the annotation of transmembrane proteins. COMPUTERS & CHEMISTRY 2001; 26:41-9. [PMID: 11765850 DOI: 10.1016/s0097-8485(01)00098-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Information agents integrate multiple distributed heterogeneous information sources. The challenging yet unsolved problem that remains, is to ensure the semantic consistency of the integrated data. In this paper we set out to develop a general approach to inconsistency management for information agents. It is implemented as part of the EDITtoTrEMBL system and applied on a large real-world problem in the domain of bioinformatics.
Collapse
Affiliation(s)
- S Möller
- European Bioinformatics Institute, Cambridge, UK.
| | | | | |
Collapse
|
14
|
Abstract
The chapter gives an overview of bioinformatic techniques of importance in protein analysis. These include database searches, sequence comparisons and structural predictions. Links to useful World Wide Web (WWW) pages are given in relation to each topic. Databases with biological information are reviewed with emphasis on databases for nucleotide sequences (EMBL, GenBank, DDBJ), genomes, amino acid sequences (Swissprot, PIR, TrEMBL, GenePept), and three-dimensional structures (PDB). Integrated user interfaces for databases (SRS and Entrez) are described. An introduction to databases of sequence patterns and protein families is also given (Prosite, Pfam, Blocks). Furthermore, the chapter describes the widespread methods for sequence comparisons, FASTA and BLAST, and the corresponding WWW services. The techniques involving multiple sequence alignments are also reviewed: alignment creation with the Clustal programs, phylogenetic tree calculation with the Clustal or Phylip packages and tree display using Drawtree, njplot or phylo_win. Finally, the chapter also treats the issue of structural prediction. Different methods for secondary structure predictions are described (Chou-Fasman, Garnier-Osguthorpe-Robson, Predator, PHD). Techniques for predicting membrane proteins, antigenic sites and postranslational modifications are also reviewed.
Collapse
Affiliation(s)
- B Persson
- Stockholm Bioinformatic Centre, Sweden
| |
Collapse
|
15
|
Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 2000; 17:540-52. [PMID: 10742046 DOI: 10.1093/oxfordjournals.molbev.a026334] [Citation(s) in RCA: 7163] [Impact Index Per Article: 286.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The use of some multiple-sequence alignments in phylogenetic analysis, particularly those that are not very well conserved, requires the elimination of poorly aligned positions and divergent regions, since they may not be homologous or may have been saturated by multiple substitutions. A computerized method that eliminates such positions and at the same time tries to minimize the loss of informative sites is presented here. The method is based on the selection of blocks of positions that fulfill a simple set of requirements with respect to the number of contiguous conserved positions, lack of gaps, and high conservation of flanking positions, making the final alignment more suitable for phylogenetic analysis. To illustrate the efficiency of this method, alignments of 10 mitochondrial proteins from several completely sequenced mitochondrial genomes belonging to diverse eukaryotes were used as examples. The percentages of removed positions were higher in the most divergent alignments. After removing divergent segments, the amino acid composition of the different sequences was more uniform, and pairwise distances became much smaller. Phylogenetic trees show that topologies can be different after removing conserved blocks, particularly when there are several poorly resolved nodes. Strong support was found for the grouping of animals and fungi but not for the position of more basal eukaryotes. The use of a computerized method such as the one presented here reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.
Collapse
Affiliation(s)
- J Castresana
- European Molecular Biology Laboratory, Heidelberg, Germany.
| |
Collapse
|
16
|
Hu JM, Lavin M, Wojciechowski MF, Sanderson MJ. Phylogenetic systematics of the tribe Millettieae (Leguminosae) based on chloroplast trnK/matK sequences and its implications for evolutionary patterns in Papilionoideae. AMERICAN JOURNAL OF BOTANY 2000; 87:418-430. [PMID: 10719003 DOI: 10.2307/2656638] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Phylogenetic relationships in the tribe Millettieae and allies in the subfamily Papilionoideae (Leguminosae) were reconstructed from chloroplast trnK/matK sequences. Sixty-two accessions representing 57 traditionally recognized genera of Papilionoideae were sampled, including 27 samples from Millettieae. Phylogenies were constructed using maximum parsimony and are well resolved and supported by high bootstrap values. A well-supported "core Millettieae" clade is recognized, comprising the four large genera Millettia, Lonchocarpus, Derris, and Tephrosia. Several other small genera of Millettieae are not in the core Millettieae clade. Platycyamus is grouped with Phaseoleae (in part). Ostryocarpus, Austrosteenisia, and Dalbergiella are neither in the core Millettieae or Phaseoleae clade. These taxa, along with core Millettieae and Phaseoleae, form a monophyletic sister group to Indigofereae. Cyclolobium and Poecilanthe are close to Brongniartieae. Callerya and Wisteria belong to a large clade that includes all the legumes that lack the inverted repeat in their chloroplast genome, which confirms previous rbcL and phytochrome gene family phylogenies. The evolutionary history of four characters was examined in Millettieae and allies: the presence of canavanine, inflorescence types, the dehiscence of pods, and the presence of winged pods. trnK/matK sequence analysis suggests that the presence of a pseudoraceme or pseudopanicle and the accumulation of nonprotein amino acids are phylogenetically informative for Millettieae and allies with only a few exceptions.
Collapse
Affiliation(s)
- J M Hu
- Section of Evolution and Ecology, University of California, Davis, California 95616 USA
| | | | | | | |
Collapse
|
17
|
Raponi M, Atkins D, Dawes IW, Arndt GM. The influence of antisense gene location on target gene suppression in the fission yeast Schizosaccharomyces pombe. ANTISENSE & NUCLEIC ACID DRUG DEVELOPMENT 2000; 10:29-34. [PMID: 10726658 DOI: 10.1089/oli.1.2000.10.29] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
A fission yeast model was employed to investigate the influence of antisense gene location on the efficacy of antisense RNA-mediated target gene suppression. Fission yeast transformants were generated that contained the target lacZ gene at a fixed position and a single copy antisense lacZ gene integrated into various genomic locations, including the same locus as the target gene. No significant difference in lacZ suppression was observed when the antisense gene was integrated in close proximity to the target gene locus compared with other genomic locations, indicating that target and antisense gene colocalization is not a critical factor for efficient antisense RNA-mediated gene expression in vivo. Instead, increased lacZ downregulation correlated with an increase in antisense dose, with the steady-state levels of antisense RNA being dependent on genomic position effects and transgene copy number.
Collapse
Affiliation(s)
- M Raponi
- Department of Biochemistry and Molecular Genetics, University of New South Wales, Sydney, Australia
| | | | | | | |
Collapse
|
18
|
van Batenburg FH, Gultyaev AP, Pleij CW, Ng J, Oliehoek J. PseudoBase: a database with RNA pseudoknots. Nucleic Acids Res 2000; 28:201-4. [PMID: 10592225 PMCID: PMC102383 DOI: 10.1093/nar/28.1.201] [Citation(s) in RCA: 114] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/1999] [Revised: 09/03/1999] [Accepted: 09/22/1999] [Indexed: 11/13/2022] Open
Abstract
PseudoBase is a database containing structural, functional and sequence data related to RNA pseudo-knots. It can be reached at http://wwwbio. Leiden Univ.nl/ approximately Batenburg/PKB.html. This page will direct the user to a retrieval page from where a particular pseudoknot can be chosen, or to a submission page which enables the user to add pseudoknot information to the database or to an informative page that elaborates on the various aspects of the database. For each pseudoknot, 12 items are stored, e.g. the nucleotides of the region that contains the pseudoknot, the stem positions of the pseudoknot, the EMBL accession number of the sequence that contains this pseudoknot and the support that can be given regarding the reliability of the pseudoknot. Access is via a small number of steps, using 16 different categories. The development process was done by applying the evolutionary methodology for software development rather than by applying the methodology of the classical waterfall model or the more modern spiral model.
Collapse
Affiliation(s)
- F H van Batenburg
- Group Theoretical Biology, Institute of Evolutionary and Ecological Sciences, Leiden University, Kaiserstraat 63, 2311GP Leiden, The Netherlands.
| | | | | | | | | |
Collapse
|
19
|
Bioinformatics, robust realm based upon multidisciplinary knowledge of biological data and computational techniques. CHINESE SCIENCE BULLETIN-CHINESE 1999. [DOI: 10.1007/bf02886336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
20
|
Moore RC, Lee IY, Silverman GL, Harrison PM, Strome R, Heinrich C, Karunaratne A, Pasternak SH, Chishti MA, Liang Y, Mastrangelo P, Wang K, Smit AF, Katamine S, Carlson GA, Cohen FE, Prusiner SB, Melton DW, Tremblay P, Hood LE, Westaway D. Ataxia in prion protein (PrP)-deficient mice is associated with upregulation of the novel PrP-like protein doppel. J Mol Biol 1999; 292:797-817. [PMID: 10525406 DOI: 10.1006/jmbi.1999.3108] [Citation(s) in RCA: 376] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The novel locus Prnd is 16 kb downstream of the mouse prion protein (PrP) gene Prnp and encodes a 179 residue PrP-like protein designated doppel (Dpl). Prnd generates major transcripts of 1.7 and 2.7 kb as well as some unusual chimeric transcripts generated by intergenic splicing with Prnp. Like PrP, Dpl mRNA is expressed during embryogenesis but, in contrast to PrP, it is expressed minimally in the CNS. Unexpectedly, Dpl is upregulated in the CNS of two PrP-deficient (Prnp(0/0)) lines of mice, both of which develop late-onset ataxia, suggesting that Dpl may provoke neurodegeneration. Dpl is the first PrP-like protein to be described in mammals, and since Dpl seems to cause neurodegeneration similar to PrP, the linked expression of the Prnp and Prnd genes may play a previously unrecognized role in the pathogenesis of prion diseases or other illnesses.
Collapse
Affiliation(s)
- R C Moore
- Institute for Neurodegenerative Diseases, Departments of Neurology
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Krawczak M, Chuzhanova NA, Cooper DN. Evolution of the proximal promoter region of the mammalian growth hormone gene. Gene 1999; 237:143-51. [PMID: 10524245 DOI: 10.1016/s0378-1119(99)00313-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The evolutionary relationship between the proximal growth hormone (GH) gene promoter sequences of 12 mammalian species was explored by comparison of their trinucleotide composition and by multiple sequence alignment. Both approaches yielded results that were consistent with the known fossil record-based phylogeny of the analysed sequences, suggesting that the two methods of tree reconstruction might be equally efficient and reliable. The pattern of evolution inferred for the mammalian GH gene promoters was found to vary both temporally and spatially. Thus, two distinct regions devoid of any evolutionary changes exist in primates, but only one of these 'gaps' is also observed in rodents, and neither is seen in ruminants. Furthermore, different evolutionary rates must have prevailed during different periods of evolutionary time and in different lineages, with a dramatic increase in evolutionary rate apparent in primates. Since a similar pattern of discontinuity has been previously noted for the evolution of the GH-coding regions, it may reflect the action of positive selection operating upon the GH gene as a single cohesive unit. Strong evidence for the action of gene conversion between primate GH gene promoters is provided by the fact that the human GH1 and GH2 sequences, which are thought to have diverged before the divergence of Old World monkeys from great apes, are more similar to one another than either is to the rhesus monkey GH2 promoter. Finally, it was noted that a number of nucleotide positions in the GH1 gene promoter that are polymorphic in humans appear to be highly conserved in mammals. This apparent conundrum, which could represent a caveat for the interpretation of phylogenetic footprinting studies, is potentially explicable in terms either of reduced genetic diversity in highly inbred animal species or insufficient population data from non-human species.
Collapse
Affiliation(s)
- M Krawczak
- Institute of Medical Genetics, University of Wales College of Medicine, Heath Park Cardiff CF4 4XN, UK.
| | | | | |
Collapse
|
22
|
Jareborg N, Birney E, Durbin R. Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res 1999; 9:815-24. [PMID: 10508839 PMCID: PMC310816 DOI: 10.1101/gr.9.9.815] [Citation(s) in RCA: 145] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A data set of 77 genomic mouse/human gene pairs has been compiled from the EMBL nucleotide database, and their corresponding features determined. This set was used to analyze the degree of conservation of noncoding sequences between mouse and human. A new alignment algorithm was developed to cope with the fact that large parts of noncoding sequences are not alignable in a meaningful way because of genetic drift. This new algorithm, DNA Block Aligner (DBA), finds colinear-conserved blocks that are flanked by nonconserved sequences of varying lengths. The noncoding regions of the data set were aligned with DBA. The proportion of the noncoding regions covered by blocks >60% identical was 36% for upstream regions, 50% for 5' UTRs, 23% for introns, and 56% for 3' UTRs. These blocks of high identity were more or less evenly distributed across the length of the features, except for upstream regions in which the first 100 bp upstream of the transcription start site was covered in up to 70% of the gene pairs. This data set complements earlier sets on the basis of cDNA sequences and will be useful for further comparative studies. [This paper contains supplementary data that can be found at http://www.genome.org [corrected]].
Collapse
Affiliation(s)
- N Jareborg
- The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | |
Collapse
|
23
|
Pedersen AG, Baldi P, Chauvin Y, Brunak S. The biology of eukaryotic promoter prediction--a review. COMPUTERS & CHEMISTRY 1999; 23:191-207. [PMID: 10404615 DOI: 10.1016/s0097-8485(99)00015-7] [Citation(s) in RCA: 136] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Computational prediction of eukaryotic promoters from the nucleotide sequence is one of the most attractive problems in sequence analysis today, but it is also a very difficult one. Thus, current methods predict in the order of one promoter per kilobase in human DNA, while the average distance between functional promoters has been estimated to be in the range of 30-40 kilobases. Although it is conceivable that some of these predicted promoters correspond to cryptic initiation sites that are used in vivo, it is likely that most are false positives. This suggests that it is important to carefully reconsider the biological data that forms the basis of current algorithms, and we here present a review of data that may be useful in this regard. The review covers the following topics: (1) basal transcription and core promoters, (2) activated transcription and transcription factor binding sites, (3) CpG islands and DNA methylation, (4) chromosomal structure and nucleosome modification, and (5) chromosomal domains and domain boundaries. We discuss the possible lessons that may be learned, especially with respect to the wealth of information about epigenetic regulation of transcription that has been appearing in recent years.
Collapse
Affiliation(s)
- A G Pedersen
- Department of Biotechnology, Technical University of Denmark, Lyngby, Denmark.
| | | | | | | |
Collapse
|
24
|
Nogales B, Moore ER, Abraham WR, Timmis KN. Identification of the metabolically active members of a bacterial community in a polychlorinated biphenyl-polluted moorland soil. Environ Microbiol 1999; 1:199-212. [PMID: 11207739 DOI: 10.1046/j.1462-2920.1999.00024.x] [Citation(s) in RCA: 130] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The presumptive metabolically active members of a bacterial community in a moorland soil in Germany, highly polluted with polychlorinated biphenyls (PCBs), were identified by sequencing of cloned reverse transcription-polymerase chain reaction (RT-PCR) amplification products of 16S rRNA generated from total RNA extracts. Analysis of the 16S rRNA clone library revealed a considerable diversity of metabolically active bacteria in the soil, despite the acidic pH and high concentrations of PCBs. Cloned sequence types clustered within the Proteobacteria (34% alpha-, 33% beta- and 7% gamma-subclasses), the Holophaga-Acidobacterium phylum (14%), the Actinobacteria (6.5%) and the Planctomycetales (2%). Three cloned sequence types were not affiliated to any described phylogenetic group. An unusual feature of this soil was the abundance of sequence types within the beta-subclass of the Proteobacteria, most of which were similar to the 16S rRNA gene sequences of species from only two genera, Burkholderia and Variovorax. Three other numerous 16S rRNA sequence types were similar to the sequences of Sphingomonas species, members of the Rhodopila globiformis group and Acidobacterium capsulatum. Some of the sequence types retrieved were similar to the 16S rRNA sequences of bacterial isolates able to degrade a variety of organic pollutants, including PCBs. As the PCB contamination is the major source of measurable carbon in this soil, some of the 16S rRNA sequence types detected and presumed to represent the metabolically active members of the community indicate the organisms likely to be involved, directly or indirectly, in the utilization of the PCBs as carbon and energy sources.
Collapse
Affiliation(s)
- B Nogales
- Division of Microbiology, GBF-National Research Centre for Biotechnology, Braunschweig, Germany.
| | | | | | | |
Collapse
|
25
|
Wolfsberg TG, Madden TL. Sequence Similarity Searching Using the
BLAST
Family of Programs. ACTA ACUST UNITED AC 1999; Chapter 2:Unit2.5. [DOI: 10.1002/0471140864.ps0205s15] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Tyra G. Wolfsberg
- National Center for Biotechnology Information, National Library of Medicine, NIH Bethesda Maryland
| | - Thomas L. Madden
- National Center for Biotechnology Information, National Library of Medicine, NIH Bethesda Maryland
| |
Collapse
|
26
|
Rheims H, Felske A, Seufert S, Stackebrandt E. Molecular monitoring of an uncultured group of the class Actinobacteria in two terrestrial environments. J Microbiol Methods 1999; 36:65-75. [PMID: 10353801 DOI: 10.1016/s0167-7012(99)00012-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Previous investigations of 16S rRNA clone libraries from a wide spectrum of mainly terrestrial origin have shown the worldwide distribution of several as yet uncultivated phylogenetically deeply rooting groups of Actinobacteria. From the percentage of the occurrence of these clones it was concluded that these organisms constitute a significant part of the bacterial microflora in these habitats. Two of the clone groups, previously designated group II and group III, were shown to be phylogenetically moderately related among each other. In order to more exactly determine the abundance of a representative of group II, clone DA079, the fraction of the organism's rRNA in total extracted rRNA was determined in several neighboring samples from Drentse A grassland soil (The Netherlands). The fraction ranged from 2.6 to 9.1%, averaging 5.5%. Based upon comparison of total rRNA and strain DA079-specific rRNA it was concluded that on the average 2 x 10(6) cells/g of this organism are present in the investigated soil. Attempts to isolate members of one of the 16S rDNA clone groups of Actinobacteria were made with samples from a German peat bog, in which the organisms had been detected previously. Molecular detection of group III organisms by a nested PCR approach was possible in different cultivation media. Despite the wide spectrum of growth media employed the isolation of group III strains failed.
Collapse
Affiliation(s)
- H Rheims
- DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH, Braunschweig, Germany
| | | | | | | |
Collapse
|
27
|
Carazo JM, Stelzer EH. The BioImage Database Project: organizing multidimensional biological images in an object-relational database. J Struct Biol 1999; 125:97-102. [PMID: 10222266 DOI: 10.1006/jsbi.1999.4103] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The BioImage Database Project collects and structures multidimensional data sets recorded by various microscopic techniques relevant to modern life sciences. It provides, as precisely as possible, the circumstances in which the sample was prepared and the data were recorded. It grants access to the actual data and maintains links between related data sets. In order to promote the interdisciplinary approach of modern science, it offers a large set of key words, which covers essentially all aspects of microscopy. Nonspecialists can, therefore, access and retrieve significant information recorded and submitted by specialists in other areas. A key issue of the undertaking is to exploit the available technology and to provide a well-defined yet flexible structure for dealing with data. Its pivotal element is, therefore, a modern object relational database that structures the metadata and ameliorates the provision of a complete service. The BioImage database can be accessed through the Internet.
Collapse
Affiliation(s)
- J M Carazo
- Centro Nacional de Biotecnología-CSIC, Campus Universidad Autonoma, Madrid, E-28049, Spain
| | | |
Collapse
|
28
|
Hill KE, Marchesi JR, Weightman AJ. Investigation of two evolutionarily unrelated halocarboxylic acid dehalogenase gene families. J Bacteriol 1999; 181:2535-47. [PMID: 10198020 PMCID: PMC93682 DOI: 10.1128/jb.181.8.2535-2547.1999] [Citation(s) in RCA: 71] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/1998] [Accepted: 01/29/1999] [Indexed: 11/20/2022] Open
Abstract
Dehalogenases are key enzymes in the metabolism of halo-organic compounds. This paper describes a systematic approach to the isolation and molecular analysis of two families of bacterial alpha-halocarboxylic acid (alphaHA) dehalogenase genes, called group I and group II deh genes. The two families are evolutionarily unrelated and together represent almost all of the alphaHA deh genes described to date. We report the design and evaluation of degenerate PCR primer pairs for the separate amplification and isolation of group I and II deh genes. Amino acid sequences derived from 10 of 11 group I deh partial gene products of new and previously reported bacterial isolates showed conservation of five residues previously identified as essential for activity. The exception, DehD from a Rhizobium sp., had only two of these five residues. Group II deh gene sequences were amplified from 54 newly isolated strains, and seven of these sequences were cloned and fully characterized. Group II dehalogenases were stereoselective, dechlorinating L- but not D-2-chloropropionic acid, and derived amino acid sequences for all of the genes except dehII degrees P11 showed conservation of previously identified essential residues. Molecular analysis of the two deh families highlighted four subdivisions in each, which were supported by high bootstrap values in phylogenetic trees and by enzyme structure-function considerations. Group I deh genes included two putative cryptic or silent genes, dehI degrees PP3 and dehI degrees 17a, produced by different organisms. Group II deh genes included two cryptic genes and an active gene, dehIIPP3, that can be switched off and on. All alphaHA-degrading bacteria so far described were Proteobacteria, a result that may be explained by limitations either in the host range for deh genes or in isolation methods.
Collapse
Affiliation(s)
- K E Hill
- Cardiff School of Biosciences, Cardiff University, Cardiff, CF1 3TL, Wales, United Kingdom
| | | | | |
Collapse
|
29
|
Parsons JD, Buehler E, Hillier L. DNA sequence chromatogram browsing using JAVA and CORBA. Genome Res 1999; 9:277-81. [PMID: 10077534 PMCID: PMC310717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
DNA sequence chromatograms (traces) are the primary data source for all large-scale genomic and expressed sequence tags (ESTs) sequencing projects. Access to the sequencing trace assists many later analyses, for example contig assembly and polymorphism detection, but obtaining and using traces is problematic. Traces are not collected and published centrally, they are much larger than the base calls derived from them, and viewing them requires the interactivity of a local graphical client with local data. To provide efficient global access to DNA traces, we developed a client/server system based on flexible Java components integrated into other applications including an applet for use in a WWW browser and a stand-alone trace viewer. Client/server interaction is facilitated by CORBA middleware which provides a well-defined interface, a naming service, and location independence. [The software is packaged as a Jar file available from the following URL: http://www.ebi.ac.uk/jparsons. Links to working examples of the trace viewers can be found at http://corba.ebi.ac.uk/EST. All the Washington University mouse EST traces are available for browsing at the same URL.]
Collapse
Affiliation(s)
- J D Parsons
- EMBL-Outstation-The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
30
|
Abstract
DNA sequence chromatograms (traces) are the primary data source for all large-scale genomic and expressed sequence tags (ESTs) sequencing projects. Access to the sequencing trace assists many later analyses, for example contig assembly and polymorphism detection, but obtaining and using traces is problematic. Traces are not collected and published centrally, they are much larger than the base calls derived from them, and viewing them requires the interactivity of a local graphical client with local data. To provide efficient global access to DNA traces, we developed a client/server system based on flexible Java components integrated into other applications including an applet for use in a WWW browser and a stand-alone trace viewer. Client/server interaction is facilitated by CORBA middleware which provides a well-defined interface, a naming service, and location independence.[The software is packaged as a Jar file available from the following URL: http://www.ebi.ac.uk/∼jparsons. Links to working examples of the trace viewers can be found athttp://corba.ebi.ac.uk/EST. All the Washington University mouse EST traces are available for browsing at the same URL.]
Collapse
|
31
|
Abstract
The Human Genome Project is generating unprecedented quantities of new genetic information. The new discipline of bioinformatics has created many new molecular biology databanks to store the results of the Human Genome Project. This data is expected to be the information source for biomedical science in the 21st century. As molecular biology research moves out of the laboratory and becomes molecular medicine, a growing number of people need access to genetic information. Medical students, healthcare practitioners and patients need help in finding appropriate information. Medical librarians should know how to search key genetic information resources and should have a basic understanding of the type of information contained in each.
Collapse
Affiliation(s)
- F Norman
- National Institute for Medical Research, London, UK
| |
Collapse
|
32
|
Attimonelli M, Cooper JM, D'Elia D, de Montalvo A, De Robertis M, Lehväslaiho H, Malladi SB, Memeo F, Stevens K, Schapira AH, Saccone C. Update of the Human MitBASE database. Nucleic Acids Res 1999; 27:143-6. [PMID: 9847160 PMCID: PMC148115 DOI: 10.1093/nar/27.1.143] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Human MitBASE is a database collecting human mtDNA variants. This database is part of a greater mitochondrial genome database (MitBASE) funded within the EU Biotech Program. The present paper reports the recent improvements in data structure, data quality and data quantity. As far as the database structure is concerned it is now fully designed and implemented. Based on the previously described structure some changes have been made to optimise both data input and data quality. Cross-references with other bio-databases (EMBL, OMIM, MEDLINE) have been implemented. Human MitBASE data can be queried with the MitBASE Simple Query System (http://www.ebi.ac.uk/htbin/Mitbase/mit base.pl) and with SRS at the EBI under the 'Mutation' section (http://srs.ebi.ac.uk/srs5/). At present the HumanMitBASE node contains approximately 5000 variants related to studies investigating population polymorphisms and pathologies.
Collapse
Affiliation(s)
- M Attimonelli
- Dipartimento di Biochimica e Biologia Molecolare, Università degli Studi di Bari, 70126 Bari, Italy
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
de Pinto B, Malladi SB, Altamura N. MitBASE pilot: a database on nuclear genes involved in mitochondrial biogenesis and its regulation in Saccharomyces cerevisiae. Nucleic Acids Res 1999; 27:147-9. [PMID: 9847161 PMCID: PMC148116 DOI: 10.1093/nar/27.1.147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In the framework of the EU BIOTECH PROGRAM and within the 'MITBASE: a comprehensive and integrated database on mtDNA' project, we have prepared a pilot database (MitBASE Pilot) on nuclear genes involved in mitochondrial biogenesis and its regulation in Saccharomyces cerevisiae. MitBASE Pilot includes nuclear genes encoding mitochondrial proteins as well as nuclear genes encoding products which are localised in other sub-cellular compartments but nevertheless interact with mitochondrial functions. Genes have been classified on the basis of the mitochondrial process in which they participate and the mitochondrial phenotype of the gene knockout. The structure of the MitBASE Pilot database has been conceived for a flexible organisation of the information. An intuitive visual query system has been developed which allows users to select information in different combinations, both in the query and the output format, according to their needs. MitBASE Pilot is a relational database, is maintained at the EMBL-European Bioinformatics Institute (EBI) and is available at the World Wide Web site http://www3.ebi.ac. uk/Research/Mitbase/mitbiog.pl
Collapse
Affiliation(s)
- B de Pinto
- Consiglio Nazionale delle Ricerche, Centro di Studio sui Mitocondri e Metabolismo Energetico, presso Università di Bari, via Amendola 165/A, I-70126 Bari, Italy
| | | | | |
Collapse
|
34
|
Carone A, Malladi SB, Attimonelli M, Saccone C. Vertebrate MitBASE: a specialised database on vertebrate mitochondrial DNA sequences. Nucleic Acids Res 1999; 27:150-2. [PMID: 9847162 PMCID: PMC148117 DOI: 10.1093/nar/27.1.150] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Vertebrate MitBASE is a specialized database where all the vertebrate mitochondrial DNA entries from primary databases are collected, revised and integrated with new information emerging from the literature. Variant sequences are also analyzed, aligned and linked to reference sequences. Data related to the same species and fragment can be viewed over the WWW. The database has a flexible interface and a retrieval system to help non-expert users and contains information not currently available in the primary databases. Vertebrate MitBASE is now available through the MitBASE home page at URL: http://www.ebi.ac.uk/htbin/Mitbase/mitb ase.pl. This work is part of a larger project, MitBASE which is a network of databases covering the full panorama of knowledge on mitochondrial DNA from protists to human sequences.
Collapse
Affiliation(s)
- A Carone
- Dipartimento di Biochimica e Biologia Molecolare, Universita' degli studi di Bari. Via E. Orabona 4, 70126 Bari, Italy and EBI, Hinxton Hall, Hinxton, Cambridge, UK
| | | | | | | |
Collapse
|
35
|
Maidak BL, Cole JR, Parker CT, Garrity GM, Larsen N, Li B, Lilburn TG, McCaughey MJ, Olsen GJ, Overbeek R, Pramanik S, Schmidt TM, Tiedje JM, Woese CR. A new version of the RDP (Ribosomal Database Project). Nucleic Acids Res 1999; 27:171-3. [PMID: 9847171 PMCID: PMC148126 DOI: 10.1093/nar/27.1.171] [Citation(s) in RCA: 676] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Ribosomal Database Project (RDP-II), previously described by Maidak et al. [ Nucleic Acids Res. (1997), 25, 109-111], is now hosted by the Center for Microbial Ecology at Michigan State University. RDP-II is a curated database that offers ribosomal RNA (rRNA) nucleotide sequence data in aligned and unaligned forms, analysis services, and associated computer programs. During the past two years, data alignments have been updated and now include >9700 small subunit rRNA sequences. The recent development of an ObjectStore database will provide more rapid updating of data, better data accuracy and increased user access. RDP-II includes phylogenetically ordered alignments of rRNA sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software programs for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (ftp.cme.msu. edu) and WWW (http://www.cme.msu.edu/RDP). The WWW server provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for possible chimeric rRNA sequences, automated alignment, and a suggested placement of an unknown sequence on an existing phylogenetic tree. Additional utilities also exist at RDP-II, including distance matrix, T-RFLP, and a Java-based viewer of the phylogenetic trees that can be used to create subtrees.
Collapse
Affiliation(s)
- B L Maidak
- Department of Microbiology, Michigan State University, 294 Giltner Hall, East Lansing, MI 48824-1101, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Sanchez C, Lachaize C, Janody F, Bellon B, Röder L, Euzenat J, Rechenmann F, Jacq B. Grasping at molecular interactions and genetic networks in Drosophila melanogaster using FlyNets, an Internet database. Nucleic Acids Res 1999; 27:89-94. [PMID: 9847149 PMCID: PMC148104 DOI: 10.1093/nar/27.1.89] [Citation(s) in RCA: 83] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
FlyNets (http://gifts.univ-mrs.fr/FlyNets/FlyNets_home_page.++ +html) is a WWW database describing molecular interactions (protein-DNA, protein-RNA and protein-protein) in the fly Drosophila melanogaster. It is composed of two parts, as follows. (i) FlyNets-base is a specialized database which focuses on molecular interactions involved in Drosophila development. The information content of FlyNets-base is distributed among several specific lines arranged according to a GenBank-like format and grouped into five thematic zones to improve human readability. The FlyNets database achieves a high level of integration with other databases such as FlyBase, EMBL, GenBank and SWISS-PROT through numerous hyperlinks. (ii) FlyNets-list is a very simple and more general databank, the long-term goal of which is to report on any published molecular interaction occuring in the fly, giving direct web access to corresponding s in Medline and in FlyBase. In the context of genome projects, databases describing molecular interactions and genetic networks will provide a link at the functional level between the genome, the proteome and the transcriptome worlds of different organisms. Interaction databases therefore aim at describing the contents, structure, function and behaviour of what we herein define as the interactome world.
Collapse
Affiliation(s)
- C Sanchez
- Laboratoire de Génétique et Physiologie du Développement, IBDM, Parc Scientifique de Luminy, CNRS Case 907, 13288 Marseille Cedex 09, France
| | | | | | | | | | | | | | | |
Collapse
|
37
|
Lefranc MP, Giudicelli V, Ginestoux C, Bodmer J, Müller W, Bontrop R, Lemaitre M, Malik A, Barbié V, Chaume D. IMGT, the international ImMunoGeneTics database. Nucleic Acids Res 1999; 27:209-12. [PMID: 9847182 PMCID: PMC148137 DOI: 10.1093/nar/27.1.209] [Citation(s) in RCA: 321] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IMGT, the international ImMunoGeneTics database (http://imgt.cnusc. fr:8104), is a high-quality integrated database specialising in Immunoglobulins (Ig), T cell Receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species, created in 1989 by Marie-Paule Lefranc, Université Montpellier II, CNRS, Montpellier, France (lefranc@ligm.igh.cnrs.fr). IMGT comprises three databases: LIGM-DB, a comprehensive database of Ig and TcR, MHC/HLA-DB, and PRIMER-DB (the last two in development); a tool, IMGT/DNAPLOT, developed for sequence analysis and alignments; and expertised data based on the IMGT scientific chart, the IMGT repertoire. By its high quality and its easy data distribution, IMGT has important implications in medical research (repertoire in autoimmune diseases, AIDS, leukemias, lymphomas), therapeutic approaches (antibody engineering), genome diversity and genome evolution studies. IMGT is freely available at http://imgt.cnusc. fr:8104
Collapse
Affiliation(s)
- M P Lefranc
- Laboratoire d'ImmunoGénétique Moléculaire, Université Montpellier II, UPR CNRS 1142 IGH, 141 rue de la Cardonille, 34396 Montpellier Cedex 5, France.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Attimonelli M, Altamura N, Benne R, Boyen C, Brennicke A, Carone A, Cooper JM, D'Elia D, de Montalvo A, de Pinto B, De Robertis M, Golik P, Grienenberger JM, Knoop V, Lanave C, Lazowska J, Lemagnen A, Malladi BS, Memeo F, Monnerot M, Pilbout S, Schapira AH, Sloof P, Slonimski P, Saccone C. MitBASE: a comprehensive and integrated mitochondrial DNA database. Nucleic Acids Res 1999; 27:128-33. [PMID: 9847157 PMCID: PMC148112 DOI: 10.1093/nar/27.1.128] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MitBASE is an integrated and comprehensive database of mitochondrial DNA data which collects all available information from different organisms and from intraspecie variants and mutants. Research institutions from different countries are involved, each in charge of developing, collecting and annotating data for the organisms they are specialised in. The design of the actual structure of the database and its implementation in a user-friendly format are the care of the European Bioinformatics Institute. The database can be accessed on the Web at the following address: http://www.ebi.ac. uk/htbin/Mitbase/mitbase.pl. The impact of this project is intended for both basic and applied research. The study of mitochondrial genetic diseases and mitochondrial DNA intraspecie diversity are key topics in several biotechnological fields. The database has been funded within the EU Biotechnology programme.
Collapse
Affiliation(s)
- M Attimonelli
- Dipartimento di Biochimica e Biologia Molecolare, Università degli Studi di Bari, 70126 Bari, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Hoogland C, Sanchez JC, Tonella L, Bairoch A, Hochstrasser DF, Appel RD. The SWISS-2DPAGE database: what has changed during the last year. Nucleic Acids Res 1999; 27:289-91. [PMID: 9847204 PMCID: PMC148159 DOI: 10.1093/nar/27.1.289] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
SWISS-2DPAGE (http://www.expasy.ch/ch2d/) is an annotated two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) database established in 1993. The current release contains 21 reference maps from human and mouse biological samples, as well as from Saccharomyces cerevisiae, Escherichia coli and Dictyostelium discoideum origin. These reference maps now have 2480 identified spots, corresponding to 528 separate protein entries in the database, in addition to virtual entries for each SWISS-PROT sequence. During the last year, the SWISS-2DPAGE has undergone major changes. Six new maps have been added, and new functions to access the data have been provided through the ExPASy server. Finally, an important change concerns the database funding source.
Collapse
Affiliation(s)
- C Hoogland
- Swiss Institute of Bioinformatics, c/o Medical Informatics Division, Geneva University Hospital, 24 rue Micheli-du-Crest, 1211 Geneva 14, Switzerland.
| | | | | | | | | | | |
Collapse
|
40
|
Licciulli F, Catalano D, D'Elia D, Lorusso V, Attimonelli M. KEYnet: a keywords database for biosequences functional organization. Nucleic Acids Res 1999; 27:365-7. [PMID: 9847230 PMCID: PMC148185 DOI: 10.1093/nar/27.1.365] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
KEYnet is a database where gene and protein names are hierarchically structured. Particular care has been devoted to the search and organisation of synonyms. The structuring is based on biological criteria in order to assist the user in the data search and to minimise the risk of loss of information. Links to the EMBL data library by the entry name and the accession number have been implemented. KEYnet is available through the World Wide Web at the following site: http://www.ba.cnr.it/keynet.html. Recently KEYnet has incorporated specific gene name classifications, which can be browsed starting from the above-mentioned KEYnet home page: the Mitochondrial Gene Names classification and the Rat Gene Names classification. KEYnet database has also been structured in a flatfile format and can be queried through SRS (http://bio-www.ba.cnr.t:8000/srs).
Collapse
Affiliation(s)
- F Licciulli
- Department of Biochemistry and Molecular Biology, Faculty of Sciences, University of Bari, 70126 Bari, Italy
| | | | | | | | | |
Collapse
|
41
|
Lanave C, Attimonelli M, De Robertis M, Licciulli F, Liuni S, Sbisá E, Saccone C. Update of AMmtDB: a database of multi-aligned metazoa mitochondrial DNA sequences. Nucleic Acids Res 1999; 27:134-7. [PMID: 9847158 PMCID: PMC148113 DOI: 10.1093/nar/27.1.134] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The present paper describes AMmtDB, a database collecting the multi-aligned sequences of vertebrate mitochondrial genes coding for proteins and tRNAs, as well as the multiple alignment of the mammalian mtDNA main regulatory region (D-loop) sequences. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. As far as the genes coding for tRNAs are concerned, the multi-alignments based on the primary and the secondary structures are both provided; for the mammalian D-loop multi-alignments we report the conserved regions of the entire D-loop (CSB1, CSB2, CSB3, the central region, ETAS1 and ETAS2) as defined by Sbisà et al. [ Gene (1997), 205, 125-140). A flatfile format for AMmtDB has been designed allowing its implementation in SRS (http://bio-www.ba.cnr.it:8000/BioWWW/#AMMTDB ). Data selected through SRS can be managed using GeneDoc or other programs for the management of multi-aligned data depending on the user's operative system. The multiple alignments have been produced with CLUSTALV and PILEUP programs and then carefully optimized manually.
Collapse
Affiliation(s)
- C Lanave
- Centro di Studio sui Mitocondri e Metabolismo Energetico, C.N.R., Via Amendola 165/A, 70126 Bari, Italy
| | | | | | | | | | | | | |
Collapse
|
42
|
Abstract
The ENZYME data bank is a repository of information related to the nomenclature of enzymes. In recent years it has become an indispensable resource for the development of metabolic databases. The current version contains information on 3704 enzymes. It is available through the ExPASy WWW server (http://www.expasy.ch/).
Collapse
Affiliation(s)
- A Bairoch
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland.
| |
Collapse
|
43
|
Benson DA, Boguski MS, Lipman DJ, Ostell J, Ouellette BF, Rapp BA, Wheeler DL. GenBank. Nucleic Acids Res 1999; 27:12-7. [PMID: 9847132 PMCID: PMC148087 DOI: 10.1093/nar/27.1.12] [Citation(s) in RCA: 384] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The GenBank (Registered Trademark symbol) sequence database incorporates DNA sequences from all available public sources, primarily through the direct submission of sequence data from individual laboratories and from large-scale sequencing projects. Most submitters use the BankIt (Web) or Sequin programs to format and send sequence data. Data exchange with the EMBL Data Library and the DNA Data Bank of Japan helps ensure comprehensive worldwide coverage. GenBank data is accessible through NCBI's integrated retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome and protein structure information. MEDLINE (Registered Trademark symbol) s from published articles describing the sequences are included as an additional source of biological annotation through the PubMed search system. Sequence similarity searching is offered through the BLAST series of database search programs. In addition to FTP, Email, and server/client versions of Entrez and BLAST, NCBI offers a wide range of World Wide Web retrieval and analysis services based on GenBank data. The GenBank database and related resources are freely accessible via the URL: http://www.ncbi.nlm.nih.gov
Collapse
Affiliation(s)
- D A Benson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health,Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | |
Collapse
|
44
|
Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res 1999; 27:49-54. [PMID: 9847139 PMCID: PMC148094 DOI: 10.1093/nar/27.1.49] [Citation(s) in RCA: 331] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domain structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. Recent developments of the database include: cross-references to additional databases; a variety of new documentation files and improvements to TrEMBL, a computer annotated supplement to SWISS-PROT. TrEMBL consists of entries in SWISS-PROT-like format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except the CDS already included in SWISS-PROT. The URLs for SWISS-PROT on the WWW are: http://www.expasy.ch/sprot and http://www. ebi.ac.uk/sprot
Collapse
Affiliation(s)
- A Bairoch
- Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland.
| | | |
Collapse
|
45
|
Abstract
Since the obtention of the complete sequence of Haemophilus influenzae Rd in 1995, the number of bacterial genomes entirely sequenced has regularly increased. A problem is that the quality of the annotations of these very large sequences is usually lower than those of the shorter entries encountered in the repository collections. Moreover, classical sequence database management systems have difficulties in handling entries of that size. In this context, we have decided to build the Enhanced Microbial Genomes Library (EMGLib) in which these two problems are alleviated. This library contains all the complete genomes from bacteria already sequenced and the yeast genome in GenBank format. The annotations are improved by the introduction of data on codon usage, gene orientation on the chromosome and gene families. It is possible to access EMGLib through two database systems set up on World Wide Web servers: the PBIL server at http://pbil.univ-lyon1.fr/emglib/emglib. html and the MICADO server at http://locus.jouy.inra.fr/micado
Collapse
Affiliation(s)
- G Perrière
- Laboratoire de Biométrie, Génétique et Biologie des Populations, Université Claude Bernard - Lyon 1, 43, boulevard du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | | | |
Collapse
|
46
|
Abstract
The signal recognition particle database (SRPDB) is located at the University of Texas Health Science Center at Tyler and includes tabulations of SRP RNA, SRP protein and SRP receptor sequences. The sequences are annotated with links to the primary databases. They are ordered alphabetically or phylogenetically and are available in aligned form. As of September, 1998, there were 108 SRP RNA sequences, 83 SRP protein sequences and 28 sequences of the SRP receptor alpha subunit and its homologues. In addition, the SRPDB provides search motifs consisting of conserved amino acid and nucleotide residues, and a limited number of SRP RNA secondary structure diagrams and 3-D models. The data are available freely at the URL http://psyche.uthct.edu/dbs/SRPDB/SRPDB.++ +html
Collapse
Affiliation(s)
- T Samuelsson
- Department of Medical Biochemistry, University of Göteborg, Medicinareg. 9A, S-413 90, Göteborg, Sweden
| | | |
Collapse
|
47
|
Abstract
The PRESAGE database is a collaborative resource for structural genomics. It provides a database of proteins to which researchers add annotations indicating current experimental status, structural predictions and suggestions. The database is intended to enhance communication among structural genomics researchers and aid dissemination of their results. The PRESAGE database may be accessed at http://presage.stanford.edu/
Collapse
Affiliation(s)
- S E Brenner
- Department of Structural Biology, Stanford University, Fairchild Building D-109, Stanford, CA 94305-5126, USA.
| | | | | |
Collapse
|
48
|
Murvai J, Vlahovicek K, Barta E, Szepesvári C, Acatrinei C, Pongor S. The SBASE protein domain library, release 6.0: a collection of annotated protein sequence segments. Nucleic Acids Res 1999; 27:257-9. [PMID: 9847195 PMCID: PMC148150 DOI: 10.1093/nar/27.1.257] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The sixth release of the SBASE protein domain library sequences contains 130 703 annotated and crossreferenced entries corresponding to structural, functional, ligand-binding and topogenic segments of proteins. The entries were grouped based on standard names (2312 groups) and futher classified on the basis of the BLAST similarity (2463 clusters). Automated searching with BLAST and a new sequence-plot representation of local domain similarities are available at the WWW-server http://www.icgeb.trieste.it/sbase. A mirror site is at http://sbase.abc.hu/sbase. The database is freely available by anonymous 'ftp' file transfer from ftp.icgeb.trieste.it
Collapse
Affiliation(s)
- J Murvai
- International Centre for Genetic Engineering and Biotechnology, Area Science Park, 34012 Trieste, Italy
| | | | | | | | | | | |
Collapse
|
49
|
Abstract
The PROSITE database (http://www.expasy.ch/sprot/prosite.htm l) consists of biologically significant patterns and profiles formulated in such a way that with appropriate computational tools it can help to determine to which known family of protein (if any) a new sequence belongs, or which known domain(s) it contains.
Collapse
Affiliation(s)
- K Hofmann
- MEMOREC, Stoffel GmbH, Stoeckheimer Weg 1, D-50829 Koeln, Germany, Swiss Institute of Bioinformatics (SIB), Swiss Institute for Experimental Cancer Research (ISREC), CH-1066 Epalinges/Lausanne, Switzerland
| | | | | | | |
Collapse
|
50
|
Périer RC, Junier T, Bonnard C, Bucher P. The Eukaryotic Promoter Database (EPD): recent developments. Nucleic Acids Res 1999; 27:307-9. [PMID: 9847211 PMCID: PMC148166 DOI: 10.1093/nar/27.1.307] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. Recent efforts have focused on exhaustive cross-referencing to the EMBL nucleotide sequence database, and on the improvement of the WWW-based user interfaces and data retrieval mechanisms. EPD can be accessed at http://www.epd.isb-sib.ch
Collapse
Affiliation(s)
- R C Périer
- Swiss Institute of Bioinformatics & Swiss Institute for Experimental Cancer Research, Ch. des Boveresses 155, 1066-Epalinges s/Lausanne, Switzerland
| | | | | | | |
Collapse
|