1
|
Abstract
Scientific articles often contain relevant geographic information such as where field work was performed or where patients were treated. Most often, this information appears in the full-text article contents as a description in natural language including place names, with no accompanying machine-readable geographic metadata. Automatically extracting this geographic information could help conduct meta-analyses, find geographical research gaps, and retrieve articles using spatial search criteria. Research on this problem is still in its infancy, with many works manually processing corpora for locations and few cross-domain studies. In this paper, we develop a fully automatic pipeline to extract and represent relevant locations from scientific articles, applying it to two varied corpora. We obtain good performance, with full pipeline precision of 0.84 for an environmental corpus, and 0.78 for a biomedical corpus. Our results can be visualized as simple global maps, allowing human annotators to both explore corpus patterns in space and triage results for downstream analysis. Future work should not only focus on improving individual pipeline components, but also be informed by user needs derived from the potential spatial analysis and exploration of such corpora.
Collapse
Affiliation(s)
- Elise Acheson
- Department of Geography, University of Zurich, Zurich, Switzerland
- * E-mail:
| | - Ross S. Purves
- Department of Geography, University of Zurich, Zurich, Switzerland
| |
Collapse
|
2
|
Juventia S, Jones S, Laporte M, Remans R, Villani C, Estrada-carmona N. Text Mining National Commitments towards Agrobiodiversity Conservation and Use. Sustainability 2020; 12:715. [DOI: 10.3390/su12020715] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Capturing countries’ commitments for measuring and monitoring progress towards certain goals, including the Sustainable Development Goals (SDGs), remains underexplored. The Agrobiodiversity Index bridges this gap by using text mining techniques to quantify countries’ commitments towards safeguarding and using agrobiodiversity for healthy diets, sustainable agriculture, and effective genetic resource management. The Index extracts potentially relevant sections of official documents, followed by manual sifting and scoring to identify agrobiodiversity-related commitments and assign scores. Our aim is to present the text mining methodology used in the Agrobiodiversity Index and the calculated commitments scores for nine countries while identifying methodological improvements to strengthen it. Our results reveal that levels of commitment towards using and protecting agrobiodiversity vary between countries, with most showing the strongest commitments to enhancing agrobiodiversity for genetic resource management followed by healthy diets. No commitments were found in any country related to some specific themes including varietal diversity, seed diversity, and functional diversity. The revised text mining methodology can be used for benchmarking, learning, and improving policies to enable conservation and sustainable use of agrobiodiversity. This low-cost, rapid, remotely applicable approach to capture and analyse policy commitments can be readily applied for tracking progress towards meeting other sustainability objectives.
Collapse
|
3
|
Magge A, Weissenbacher D, Sarker A, Scotch M, Gonzalez-Hernandez G. Deep neural networks and distant supervision for geographic location mention extraction. Bioinformatics 2019; 34:i565-i573. [PMID: 29950020 PMCID: PMC6022665 DOI: 10.1093/bioinformatics/bty273] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Motivation Virus phylogeographers rely on DNA sequences of viruses and the locations of the infected hosts found in public sequence databases like GenBank for modeling virus spread. However, the locations in GenBank records are often only at the country or state level, and may require phylogeographers to scan the journal articles associated with the records to identify more localized geographic areas. To automate this process, we present a named entity recognizer (NER) for detecting locations in biomedical literature. We built the NER using a deep feedforward neural network to determine whether a given token is a toponym or not. To overcome the limited human annotated data available for training, we use distant supervision techniques to generate additional samples to train our NER. Results Our NER achieves an F1-score of 0.910 and significantly outperforms the previous state-of-the-art system. Using the additional data generated through distant supervision further boosts the performance of the NER achieving an F1-score of 0.927. The NER presented in this research improves over previous systems significantly. Our experiments also demonstrate the NER’s capability to embed external features to further boost the system’s performance. We believe that the same methodology can be applied for recognizing similar biomedical entities in scientific literature.
Collapse
Affiliation(s)
- Arjun Magge
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA.,Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Davy Weissenbacher
- Department of Biostatistics, Epidemiology, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Abeed Sarker
- Department of Biostatistics, Epidemiology, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Matthew Scotch
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA.,Biodesign Center for Environmental Health Engineering, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
4
|
Magge A, Weissenbacher D, Sarker A, Scotch M, Gonzalez-Hernandez G. Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature. Pac Symp Biocomput 2019; 24:100-111. [PMID: 30864314 PMCID: PMC6417823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Phylogeography research involving virus spread and tree reconstruction relies on accurate geographic locations of infected hosts. Insufficient level of geographic information in nucleotide sequence repositories such as GenBank motivates the use of natural language processing methods for extracting geographic location names (toponyms) in the scientific article associated with the sequence, and disambiguating the locations to their co-ordinates. In this paper, we present an extensive study of multiple recurrent neural network architectures for the task of extracting geographic locations and their effective contribution to the disambiguation task using population heuristics. The methods presented in this paper achieve a strict detection F1 score of 0.94, disambiguation accuracy of 91% and an overall resolution F1 score of 0.88 that are significantly higher than previously developed methods, improving our capability to find the location of infected hosts and enrich metadata information.
Collapse
Affiliation(s)
- Arjun Magge
- College of Health Solutions, Arizona State University, Tempe, AZ 85281, USA
- Biodesign Center for Environmental Health Engineering, Arizona State University, Tempe, AZ 85281, USA
| | - Davy Weissenbacher
- Department of Biostatistics, Epidemiology and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Abeed Sarker
- Department of Biostatistics, Epidemiology and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matthew Scotch
- College of Health Solutions, Arizona State University, Tempe, AZ 85281, USA
- Biodesign Center for Environmental Health Engineering, Arizona State University, Tempe, AZ 85281, USA
| | - Graciela Gonzalez-Hernandez
- Department of Biostatistics, Epidemiology and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
5
|
Tahsin T, Weissenbacher D, Jones-Shargani D, Magee D, Vaiente M, Gonzalez G, Scotch M. Named entity linking of geospatial and host metadata in GenBank for advancing biomedical research. Database (Oxford) 2017; 2017:4781736. [PMID: 30412219 PMCID: PMC6225896 DOI: 10.1093/database/bax093] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2017] [Revised: 11/20/2017] [Accepted: 11/21/2017] [Indexed: 02/06/2023]
Abstract
DATABASE URL : https://zodo.asu.edu/zoophydb/.
Collapse
Affiliation(s)
- Tasnia Tahsin
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
| | - Davy Weissenbacher
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
- Biodesign Center for Environmental Health Engineering, Arizona State University 781 E, Terrace Mall Tempe, AZ 85281 USA
| | - Demetrius Jones-Shargani
- Biodesign Center for Environmental Health Engineering, Arizona State University 781 E, Terrace Mall Tempe, AZ 85281 USA
| | - Daniel Magee
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
- Biodesign Center for Environmental Health Engineering, Arizona State University 781 E, Terrace Mall Tempe, AZ 85281 USA
| | - Matteo Vaiente
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
- Biodesign Center for Environmental Health Engineering, Arizona State University 781 E, Terrace Mall Tempe, AZ 85281 USA
| | - Graciela Gonzalez
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
- Institute of Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA 19104, USA
| | - Matthew Scotch
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
- Biodesign Center for Environmental Health Engineering, Arizona State University 781 E, Terrace Mall Tempe, AZ 85281 USA
| |
Collapse
|
6
|
D'Souza M, Sulakhe D, Wang S, Xie B, Hashemifar S, Taylor A, Dubchak I, Conrad Gilliam T, Maltsev N. Strategic Integration of Multiple Bioinformatics Resources for System Level Analysis of Biological Networks. Methods Mol Biol 2017; 1613:85-99. [PMID: 28849559 DOI: 10.1007/978-1-4939-7027-8_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Recent technological advances in genomics allow the production of biological data at unprecedented tera- and petabyte scales. Efficient mining of these vast and complex datasets for the needs of biomedical research critically depends on a seamless integration of the clinical, genomic, and experimental information with prior knowledge about genotype-phenotype relationships. Such experimental data accumulated in publicly available databases should be accessible to a variety of algorithms and analytical pipelines that drive computational analysis and data mining.We present an integrated computational platform Lynx (Sulakhe et al., Nucleic Acids Res 44:D882-D887, 2016) ( http://lynx.cri.uchicago.edu ), a web-based database and knowledge extraction engine. It provides advanced search capabilities and a variety of algorithms for enrichment analysis and network-based gene prioritization. It gives public access to the Lynx integrated knowledge base (LynxKB) and its analytical tools via user-friendly web services and interfaces. The Lynx service-oriented architecture supports annotation and analysis of high-throughput experimental data. Lynx tools assist the user in extracting meaningful knowledge from LynxKB and experimental data, and in the generation of weighted hypotheses regarding the genes and molecular mechanisms contributing to human phenotypes or conditions of interest. The goal of this integrated platform is to support the end-to-end analytical needs of various translational projects.
Collapse
Affiliation(s)
- Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA.
- Argonne National Laboratory, Building 221, Room: A142, 9700 South Cass Avenue, Argonne, IL, 60439, USA.
| | - Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, 60637, USA
| | - Bing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, 60616, USA
| | - Somaye Hashemifar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, 60637, USA
| | - Andrew Taylor
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
| | - Inna Dubchak
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America, Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, 60637, USA
- Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, 60637, USA
| |
Collapse
|
7
|
Tahsin T, Weissenbacher D, Rivera R, Beard R, Firago M, Wallstrom G, Scotch M, Gonzalez G. A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records. J Am Med Inform Assoc 2016; 23:934-41. [PMID: 26911818 PMCID: PMC4997033 DOI: 10.1093/jamia/ocv172] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Revised: 10/22/2015] [Accepted: 10/22/2015] [Indexed: 01/09/2023] Open
Abstract
OBJECTIVE The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. MATERIALS AND METHODS We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. RESULTS We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. DISCUSSION Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. CONCLUSION Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles.
Collapse
Affiliation(s)
- Tasnia Tahsin
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
| | - Davy Weissenbacher
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
| | - Robert Rivera
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
| | - Rachel Beard
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
| | - Mari Firago
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
| | - Garrick Wallstrom
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
| | - Matthew Scotch
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
| | - Graciela Gonzalez
- Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA
| |
Collapse
|
8
|
Weissenbacher D, Tahsin T, Beard R, Figaro M, Rivera R, Scotch M, Gonzalez G. Knowledge-driven geospatial location resolution for phylogeographic models of virus migration. Bioinformatics 2015; 31:i348-56. [PMID: 26072502 PMCID: PMC4542781 DOI: 10.1093/bioinformatics/btv259] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Summary: Diseases caused by zoonotic viruses (viruses transmittable between humans and animals) are a major threat to public health throughout the world. By studying virus migration and mutation patterns, the field of phylogeography provides a valuable tool for improving their surveillance. A key component in phylogeographic analysis of zoonotic viruses involves identifying the specific locations of relevant viral sequences. This is usually accomplished by querying public databases such as GenBank and examining the geospatial metadata in the record. When sufficient detail is not available, a logical next step is for the researcher to conduct a manual survey of the corresponding published articles. Motivation: In this article, we present a system for detection and disambiguation of locations (toponym resolution) in full-text articles to automate the retrieval of sufficient metadata. Our system has been tested on a manually annotated corpus of journal articles related to phylogeography using integrated heuristics for location disambiguation including a distance heuristic, a population heuristic and a novel heuristic utilizing knowledge obtained from GenBank metadata (i.e. a ‘metadata heuristic’). Results: For detecting and disambiguating locations, our system performed best using the metadata heuristic (0.54 Precision, 0.89 Recall and 0.68 F-score). Precision reaches 0.88 when examining only the disambiguation of location names. Our error analysis showed that a noticeable increase in the accuracy of toponym resolution is possible by improving the geospatial location detection. By improving these fundamental automated tasks, our system can be a useful resource to phylogeographers that rely on geospatial metadata of GenBank sequences. Contact:davy.weissenbacher@asu.edu
Collapse
Affiliation(s)
- Davy Weissenbacher
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
| | - Tasnia Tahsin
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
| | - Rachel Beard
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
| | - Mari Figaro
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
| | - Robert Rivera
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
| | - Matthew Scotch
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
| | - Graciela Gonzalez
- Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA and Center for Environmental Security, Biodesign Institute, Arizona State University, Tempe, AZ 85287-5904, USA
| |
Collapse
|
9
|
Abstract
BACKGROUND We present the BioNLP 2011 Shared Task Bacteria Track, the first Information Extraction challenge entirely dedicated to bacteria. It includes three tasks that cover different levels of biological knowledge. The Bacteria Gene Renaming supporting task is aimed at extracting gene renaming and gene name synonymy in PubMed abstracts. The Bacteria Gene Interaction is a gene/protein interaction extraction task from individual sentences. The interactions have been categorized into ten different sub-types, thus giving a detailed account of genetic regulations at the molecular level. Finally, the Bacteria Biotopes task focuses on the localization and environment of bacteria mentioned in textbook articles. We describe the process of creation for the three corpora, including document acquisition and manual annotation, as well as the metrics used to evaluate the participants' submissions. RESULTS Three teams submitted to the Bacteria Gene Renaming task; the best team achieved an F-score of 87%. For the Bacteria Gene Interaction task, the only participant's score had reached a global F-score of 77%, although the system efficiency varies significantly from one sub-type to another. Three teams submitted to the Bacteria Biotopes task with very different approaches; the best team achieved an F-score of 45%. However, the detailed study of the participating systems efficiency reveals the strengths and weaknesses of each participating system. CONCLUSIONS The three tasks of the Bacteria Track offer participants a chance to address a wide range of issues in Information Extraction, including entity recognition, semantic typing and coreference resolution. We found common trends in the most efficient systems: the systematic use of syntactic dependencies and machine learning. Nevertheless, the originality of the Bacteria Biotopes task encouraged the use of interesting novel methods and techniques, such as term compositionality, scopes wider than the sentence.
Collapse
Affiliation(s)
- Robert Bossy
- Mathématique Informatique et Génome, Institut National de la Recherche Agronomique, INRA UR1077 - F78352 Jouy-en-Josas, France
| | - Julien Jourde
- Mathématique Informatique et Génome, Institut National de la Recherche Agronomique, INRA UR1077 - F78352 Jouy-en-Josas, France
| | | | - Philippe Veber
- Mathématique Informatique et Génome, Institut National de la Recherche Agronomique, INRA UR1077 - F78352 Jouy-en-Josas, France
| | - Erick Alphonse
- PredictiveDB - 16, rue Alexandre Parodi - F75010 Paris, France
| | - Maarten van de Guchte
- MICALIS, Institut National de la Recherche Agronomique, UMR1319 - F78352 Jouy-en-Josas, France
| | - Philippe Bessières
- Mathématique Informatique et Génome, Institut National de la Recherche Agronomique, INRA UR1077 - F78352 Jouy-en-Josas, France
| | - Claire Nédellec
- Mathématique Informatique et Génome, Institut National de la Recherche Agronomique, INRA UR1077 - F78352 Jouy-en-Josas, France
| |
Collapse
|
10
|
de Lorenzo V. Genes that move the window of viability of life: lessons from bacteria thriving at the cold extreme: mesophiles can be turned into extremophiles by substituting essential genes. Bioessays 2011; 33:38-42. [PMID: 21072830 DOI: 10.1002/bies.201000101] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Whether occurrence of life at the physicochemical extremes results from the entire adaptation of organisms to such settings or it originates from the action of a few genes has been debated for a long time. Recent evidence suggests that a limited number of functions suffice to change the predilection of microorganisms for radically different environmental scenarios. For instance, expression of a few genes from cold-loving bacteria in mesophilic hosts allows them to grow at much lower temperatures and become heat-sensitive. This has been exploited not only for constructing Escherichia coli strains able to grow at 5-10 °C (and thus optimised as hosts for heterologous gene expression) but also for designing vaccines based on temperature-sensitive pathogens. Occurrence of genes/functions that reframe the windows of viability may also ask for a revision of some concepts in microbial ecology and may provide new tools for engineering bacteria with a superior biotechnological performance.
Collapse
Affiliation(s)
- Víctor de Lorenzo
- Systems and Synthetic Biology Program, Centro Nacional de Biotecnología CSIC Cantoblanco, Madrid, Spain
| |
Collapse
|