1
|
Gauthier J, Vincent AT, Charette SJ, Derome N. A brief history of bioinformatics. Brief Bioinform 2018; 20:1981-1996. [DOI: 10.1093/bib/bby063] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 06/22/2018] [Indexed: 02/06/2023] Open
Abstract
AbstractIt is easy for today’s students and researchers to believe that modern bioinformatics emerged recently to assist next-generation sequencing data analysis. However, the very beginnings of bioinformatics occurred more than 50 years ago, when desktop computers were still a hypothesis and DNA could not yet be sequenced. The foundations of bioinformatics were laid in the early 1960s with the application of computational methods to protein sequence analysis (notably, de novo sequence assembly, biological sequence databases and substitution models). Later on, DNA analysis also emerged due to parallel advances in (i) molecular biology methods, which allowed easier manipulation of DNA, as well as its sequencing, and (ii) computer science, which saw the rise of increasingly miniaturized and more powerful computers, as well as novel software better suited to handle bioinformatics tasks. In the 1990s through the 2000s, major improvements in sequencing technology, along with reduced costs, gave rise to an exponential increase of data. The arrival of ‘Big Data’ has laid out new challenges in terms of data mining and management, calling for more expertise from computer science into the field. Coupled with an ever-increasing amount of bioinformatics tools, biological Big Data had (and continues to have) profound implications on the predictive power and reproducibility of bioinformatics results. To overcome this issue, universities are now fully integrating this discipline into the curriculum of biology students. Recent subdisciplines such as synthetic biology, systems biology and whole-cell modeling have emerged from the ever-increasing complementarity between computer science and biology.
Collapse
Affiliation(s)
- Jeff Gauthier
- Institut de Biologie Intégrative et des Systèmes (IBIS), Département de Biologie, Université Laval, 1030, av. de la Médecine, Québec, Canada
| | - Antony T Vincent
- INRS-Institut Armand-Frappier, Bacterial Symbionts Evolution, 531 boul. des Prairies, Laval, QC, Canada
| | - Steve J Charette
- Centre de Recherche de l'Institut, Universitaire de Cardiologie et de Pneumologie de Québec (CRIUCPQ), 2725 Chemin Sainte-Foy, Québec, QC, Canada
- Département de Biochimie, de Microbiologie et de Bio-informatique, Université Laval, Québec, Canada
| | - Nicolas Derome
- Institut de Biologie Intégrative et des Systèmes (IBIS), Département de Biologie, Université Laval, 1030, av. de la Médecine, Québec, Canada
| |
Collapse
|
2
|
Subramaniam S, Fahy E, Gupta S, Sud M, Byrnes RW, Cotter D, Dinasarapu AR, Maurya MR. Bioinformatics and systems biology of the lipidome. Chem Rev 2011; 111:6452-90. [PMID: 21939287 PMCID: PMC3383319 DOI: 10.1021/cr200295k] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Shankar Subramaniam
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
- Departments of Chemistry and Biochemistry, and Department of Cellular and Molecular Medicine, University of California at San Diego, La Jolla, California 92093, USA
| | - Eoin Fahy
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Shakti Gupta
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Manish Sud
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
| | - Robert W. Byrnes
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
| | - Dawn Cotter
- San Diego Supercomputer Center, 9500 Gilman Drive, La Jolla, California, 92093, USA
| | - Ashok Reddy Dinasarapu
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Mano Ram Maurya
- Department of Bioengineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| |
Collapse
|
3
|
Wendl MC, Smith S, Pohl CS, Dooling DJ, Chinwalla AT, Crouse K, Hepler T, Leong S, Carmichael L, Nhan M, Oberkfell BJ, Mardis ER, Hillier LW, Wilson RK. Design and implementation of a generalized laboratory data model. BMC Bioinformatics 2007; 8:362. [PMID: 17897463 PMCID: PMC2194795 DOI: 10.1186/1471-2105-8-362] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2007] [Accepted: 09/26/2007] [Indexed: 12/02/2022] Open
Abstract
Background Investigators in the biological sciences continue to exploit laboratory automation methods and have dramatically increased the rates at which they can generate data. In many environments, the methods themselves also evolve in a rapid and fluid manner. These observations point to the importance of robust information management systems in the modern laboratory. Designing and implementing such systems is non-trivial and it appears that in many cases a database project ultimately proves unserviceable. Results We describe a general modeling framework for laboratory data and its implementation as an information management system. The model utilizes several abstraction techniques, focusing especially on the concepts of inheritance and meta-data. Traditional approaches commingle event-oriented data with regular entity data in ad hoc ways. Instead, we define distinct regular entity and event schemas, but fully integrate these via a standardized interface. The design allows straightforward definition of a "processing pipeline" as a sequence of events, obviating the need for separate workflow management systems. A layer above the event-oriented schema integrates events into a workflow by defining "processing directives", which act as automated project managers of items in the system. Directives can be added or modified in an almost trivial fashion, i.e., without the need for schema modification or re-certification of applications. Association between regular entities and events is managed via simple "many-to-many" relationships. We describe the programming interface, as well as techniques for handling input/output, process control, and state transitions. Conclusion The implementation described here has served as the Washington University Genome Sequencing Center's primary information system for several years. It handles all transactions underlying a throughput rate of about 9 million sequencing reactions of various kinds per month and has handily weathered a number of major pipeline reconfigurations. The basic data model can be readily adapted to other high-volume processing environments.
Collapse
Affiliation(s)
- Michael C Wendl
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | - Scott Smith
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | - Craig S Pohl
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | - David J Dooling
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | - Asif T Chinwalla
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | - Kevin Crouse
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | - Todd Hepler
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | - Shin Leong
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | - Lynn Carmichael
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | - Mike Nhan
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | | | - Elaine R Mardis
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | - LaDeana W Hillier
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| | - Richard K Wilson
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA
| |
Collapse
|
4
|
Donofrio N, Rajagopalon R, Brown D, Diener S, Windham D, Nolin S, Floyd A, Mitchell T, Galadima N, Tucker S, Orbach MJ, Patel G, Farman M, Pampanwar V, Soderlund C, Lee YH, Dean RA. 'PACLIMS': a component LIM system for high-throughput functional genomic analysis. BMC Bioinformatics 2005; 6:94. [PMID: 15826298 PMCID: PMC1090558 DOI: 10.1186/1471-2105-6-94] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2004] [Accepted: 04/12/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the approximately 11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. RESULTS The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. CONCLUSION Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors.
Collapse
Affiliation(s)
- Nicole Donofrio
- Department of Plant Pathology, Fungal Genomics Laboratory, North Carolina State University, Raleigh, NC, USA
| | - Ravi Rajagopalon
- Department of Plant Pathology, Fungal Genomics Laboratory, North Carolina State University, Raleigh, NC, USA
| | - Douglas Brown
- Department of Plant Pathology, Fungal Genomics Laboratory, North Carolina State University, Raleigh, NC, USA
| | - Stephen Diener
- Department of Plant Pathology, Fungal Genomics Laboratory, North Carolina State University, Raleigh, NC, USA
| | - Donald Windham
- Department of Plant Pathology, Fungal Genomics Laboratory, North Carolina State University, Raleigh, NC, USA
| | - Shelly Nolin
- Department of Plant Pathology, Fungal Genomics Laboratory, North Carolina State University, Raleigh, NC, USA
| | - Anna Floyd
- Department of Plant Pathology, Fungal Genomics Laboratory, North Carolina State University, Raleigh, NC, USA
| | - Thomas Mitchell
- Department of Plant Pathology, Fungal Genomics Laboratory, North Carolina State University, Raleigh, NC, USA
| | - Natalia Galadima
- Department of Plant Pathology, University of Arizona, Tucson, AZ, USA
| | - Sara Tucker
- Department of Plant Pathology, University of Arizona, Tucson, AZ, USA
| | - Marc J Orbach
- Department of Plant Pathology, University of Arizona, Tucson, AZ, USA
| | - Gayatri Patel
- Department of Plant Pathology, Plant Sciences Building, 1405 Veteran's Drive, University of Kentucky, Lexington, KY, 40546, USA
| | - Mark Farman
- Department of Plant Pathology, Plant Sciences Building, 1405 Veteran's Drive, University of Kentucky, Lexington, KY, 40546, USA
| | - Vishal Pampanwar
- Arizona Genomics Computational Laboratory, University of Arizona, Tucson, AZ, USA
| | - Cari Soderlund
- Arizona Genomics Computational Laboratory, University of Arizona, Tucson, AZ, USA
| | - Yong-Hwan Lee
- School of Agricultural Biotechnology, Seoul National University, Seoul, Korea
| | - Ralph A Dean
- Department of Plant Pathology, Fungal Genomics Laboratory, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
5
|
Abstract
The automation of laboratory techniques has greatly increased the number of experiments that can be carried out in the chemical and biological sciences. Until recently, this automation has focused primarily on improving hardware. Here we argue that future advances will concentrate on intelligent software to integrate physical experimentation and results analysis with hypothesis formulation and experiment planning. To illustrate our thesis, we describe the 'Robot Scientist' - the first physically implemented example of such a closed loop system. In the Robot Scientist, experimentation is performed by a laboratory robot, hypotheses concerning the results are generated by machine learning and experiments are allocated and selected by a combination of techniques derived from artificial intelligence research. The performance of the Robot Scientist has been evaluated by a rediscovery task based on yeast functional genomics. The Robot Scientist is proof that the integration of programmable laboratory hardware and intelligent software can be used to develop increasingly automated laboratories.
Collapse
Affiliation(s)
- Ken E Whelan
- Department of Computer Science, University of Wales, Aberystwyth, Penglais Campus, Aberystwyth, Ceredigion, UK.
| | | |
Collapse
|
6
|
Kelley BP, Lunn MR, Root DE, Flaherty SP, Martino AM, Stockwell BR. A Flexible Data Analysis Tool for Chemical Genetic Screens. ACTA ACUST UNITED AC 2004; 11:1495-503. [PMID: 15556000 DOI: 10.1016/j.chembiol.2004.08.026] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2004] [Revised: 08/12/2004] [Accepted: 08/31/2004] [Indexed: 12/11/2022]
Abstract
High-throughput assays generate immense quantities of data that require sophisticated data analysis tools. We have created a freely available software tool, SLIMS (Small Laboratory Information Management System), for chemical genetics which facilitates the collection and analysis of large-scale chemical screening data. Compound structures, physical locations, and raw data can be loaded into SLIMS. Raw data from high-throughput assays are normalized using flexible analysis protocols, and systematic spatial errors are automatically identified and corrected. Various computational analyses are performed on tested compounds, and dilution-series data are processed using standard or user-defined algorithms. Finally, published literature associated with active compounds is automatically retrieved from Medline and processed to yield potential mechanisms of actions. SLIMS provides a framework for analyzing high-throughput assay data both as a laboratory information management system and as a platform for experimental analysis.
Collapse
Affiliation(s)
- Brian P Kelley
- Department of Biological Sciences, Columbia University, Fairchild Center, MC 2406, 1212 Amsterdam Avenue, New York, NY 10027 USA
| | | | | | | | | | | |
Collapse
|
7
|
Xu Z, Lance B, Vargas C, Arpinar B, Bhandarkar S, Kraemer E, Kochut KJ, Miller JA, Wagner JR, Weise MJ, Wunderlich JK, Stringer J, Smulian G, Cushion MT, Arnold J. Mapping by sequencing the Pneumocystis genome using the ordering DNA sequences V3 tool. Genetics 2003; 163:1299-313. [PMID: 12702676 PMCID: PMC1462508 DOI: 10.1093/genetics/163.4.1299] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A bioinformatics tool called ODS3 has been created for mapping by sequencing. The tool allows the creation of integrated genomic maps from genetic, physical mapping, and sequencing data and permits an integrated genome map to be stored, retrieved, viewed, and queried in a stand-alone capacity, in a client/server relationship with the Fungal Genome Database (FGDB), and as a web-browsing tool for the FGDB. In that ODS3 is programmed in Java, the tool promotes platform independence and supports export of integrated genome-mapping data in the extensible markup language (XML) for data interchange with other genome information systems. The tool ODS3 is used to create an initial integrated genome map of the AIDS-related fungal pathogen, Pneumocystis carinii. Contig dynamics would indicate that this physical map is approximately 50% complete with approximately 200 contigs. A total of 10 putative multigene families were found. Two of these putative families were previously characterized in P. carinii, namely the major surface glycoproteins (MSGs) and HSP70 proteins; three of these putative families (not previously characterized in P. carinii) were found to be similar to families encoding the HSP60 in Schizosaccharomyces pombe, the heat-shock psi protein in S. pombe, and the RNA synthetase family (i.e., MES1) in Saccharomyces cerevisiae. Physical mapping data are consistent with the 16S, 5.8S, and 26S rDNA genes being single copy in P. carinii. No other fungus outside this genus is known to have the rDNA genes in single copy.
Collapse
Affiliation(s)
- Zheng Xu
- Department of Genetics, University of Georgia, Athens, Georgia 30602, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Bioinformatics, robust realm based upon multidisciplinary knowledge of biological data and computational techniques. CHINESE SCIENCE BULLETIN-CHINESE 1999. [DOI: 10.1007/bf02886336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|