1
|
Kieran TJ, Maines TR, Belser JA. Eleven quick tips to unlock the power of in vivo data science. PLoS Comput Biol 2025; 21:e1012947. [PMID: 40245007 PMCID: PMC12005514 DOI: 10.1371/journal.pcbi.1012947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2025] Open
Affiliation(s)
- Troy J. Kieran
- Influenza Division, Centers for Disease Control and Prevention, Atlanta, GeorgiaUnited States of America
| | - Taronna R. Maines
- Influenza Division, Centers for Disease Control and Prevention, Atlanta, GeorgiaUnited States of America
| | - Jessica A. Belser
- Influenza Division, Centers for Disease Control and Prevention, Atlanta, GeorgiaUnited States of America
| |
Collapse
|
2
|
Novikov A, Nachychko V. The digitisation workflow of the herbarium of the State Museum of Natural History of the NAS of Ukraine (LWS). Biodivers Data J 2025; 13:e148861. [PMID: 40190478 PMCID: PMC11971642 DOI: 10.3897/bdj.13.e148861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2025] [Accepted: 03/21/2025] [Indexed: 04/09/2025] Open
Abstract
The digitisation workflow currently applied at the Herbarium of the State Museum of Natural History of the National Academy of Sciences of Ukraine (LWS) differs from other similar by cascade ('object-to-data-to-image') multilevel organisation. Its application is predicted by the need to preselect specimens by taxon and region, as well as by batched digitisation, which occurs with significant interruptions. Focusing on certain taxonomic groups from specific regions allows us to digitise specimens that could be more valuable for early scientific processing. At the same time, the herbarium benefits from such a digitisation model by revising the existing collection classification and keeping the initial ID system. The presented digitisation workflow can be easily reproduced in any herbarium with a limited budget. The purpose of this paper is to provide detailed description and schemas of the principal digitisation stages applied at the LWS Herbarium and to briefly discuss the steps crucial for a successful result. Provided information should help to maintain the digitisation and choose appropriate equipment and materials. We can conclude that, despite its general complexity, the described workflow demonstrated itself as viable and relevant due to its robust design and focus on data quality. Despite a focus on specialists' involvement, it maintains flexibility that allows combining volunteers and, if needed, outsourced efforts. Moreover, its modularity promotes independence of principal digitisation stages and allows long interruptions between the digitisation batches.
Collapse
Affiliation(s)
- Andriy Novikov
- State Museum of Natural History of the NAS of Ukraine, Lviv, UkraineState Museum of Natural History of the NAS of UkraineLvivUkraine
| | - Viktor Nachychko
- Ivan Franko National University of Lviv, Lviv, UkraineIvan Franko National University of LvivLvivUkraine
| |
Collapse
|
3
|
Poisot T, Becker DJ, Brookson CB, Graeden E, Ryan SJ, Turon G, Carlson C. Ten quick tips to build a Model Life Cycle. PLoS Comput Biol 2025; 21:e1012731. [PMID: 39899593 PMCID: PMC11790144 DOI: 10.1371/journal.pcbi.1012731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2025] Open
Affiliation(s)
- Timothée Poisot
- Département de Sciences Biologiques, Université de Montréal, Montréal, Québec, Canada
- Québec Centre for Biodiversity Science, Montréal, Québec, Canada
| | - Daniel J. Becker
- School of Biological Sciences, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Cole B. Brookson
- Département de Sciences Biologiques, Université de Montréal, Montréal, Québec, Canada
- Yale University School of Public Health, New Haven, CT, United States of America
| | - Ellie Graeden
- Yale University School of Public Health, New Haven, CT, United States of America
- Massive Data Institute, Georgetown University, Washington, DC, United States of America
| | - Sadie J. Ryan
- Department of Geography and Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America
- College of Life Sciences, University of KwaZulu Natal, Durban, South Africa
| | - Gemma Turon
- Ersilia Open Source Initiative, Barcelona, Spain
| | - Colin Carlson
- Yale University School of Public Health, New Haven, CT, United States of America
| |
Collapse
|
4
|
Hertz MI, McNeill AS. Eleven quick tips for properly handling tabular data. PLoS Comput Biol 2024; 20:e1012604. [PMID: 39602455 PMCID: PMC11602091 DOI: 10.1371/journal.pcbi.1012604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2024] Open
Affiliation(s)
- Marla I. Hertz
- University of Alabama at Birmingham Libraries, Birmingham, Alabama, United States of America
| | - Ashley S. McNeill
- University of Alabama at Birmingham Libraries, Birmingham, Alabama, United States of America
| |
Collapse
|
5
|
Gupta A, Kumar S, Kumar A. Big Data in Bioinformatics and Computational Biology: Basic Insights. Methods Mol Biol 2024; 2719:153-166. [PMID: 37803117 DOI: 10.1007/978-1-0716-3461-5_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/08/2023]
Abstract
The human genome was first sequenced in 1994. It took 10 years of cooperation between numerous international research organizations to reveal a preliminary human DNA sequence. Genomics labs can now sequence an entire genome in only a few days. Here, we talk about how the advent of high-performance sequencing platforms has paved the way for Big Data in biology and contributed to the development of modern bioinformatics, which in turn has helped to expand the scope of biology and allied sciences. New technologies and methodologies for the storage, management, analysis, and visualization of big data have been shown to be necessary. Not only does modern bioinformatics have to deal with the challenge of processing massive amounts of heterogeneous data, but it also has to deal with different ways of interpreting and presenting those results, as well as the use of different software programs and file formats. Solutions to these problems are tried to present in this chapter. In order to store massive amounts of data and provide a reasonable period for completing search queries, new database management systems other than relational ones will be necessary. Emerging advance programing approaches, such as machine learning, Hadoop, and MapReduce, aim to provide the capacity to easily construct one's own scripts for data processing and address the issue of the diversity of genomic and proteomic data formats in bioinformatics.
Collapse
Affiliation(s)
- Aanchal Gupta
- University Institute of Biotechnology, Chandigarh University, Mohali, Punjab, India
| | - Shubham Kumar
- University Institute of Biotechnology, Chandigarh University, Mohali, Punjab, India
| | - Ashwani Kumar
- University Institute of Biotechnology, Chandigarh University, Mohali, Punjab, India
| |
Collapse
|
6
|
Berezin CT, Aguilera LU, Billerbeck S, Bourne PE, Densmore D, Freemont P, Gorochowski TE, Hernandez SI, Hillson NJ, King CR, Köpke M, Ma S, Miller KM, Moon TS, Moore JH, Munsky B, Myers CJ, Nicholas DA, Peccoud SJ, Zhou W, Peccoud J. Ten simple rules for managing laboratory information. PLoS Comput Biol 2023; 19:e1011652. [PMID: 38060459 PMCID: PMC10703290 DOI: 10.1371/journal.pcbi.1011652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023] Open
Abstract
Information is the cornerstone of research, from experimental (meta)data and computational processes to complex inventories of reagents and equipment. These 10 simple rules discuss best practices for leveraging laboratory information management systems to transform this large information load into useful scientific findings.
Collapse
Affiliation(s)
- Casey-Tyler Berezin
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Luis U. Aguilera
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Sonja Billerbeck
- Molecular Microbiology Unit, Faculty of Science and Engineering, University of Groningen, Groningen, the Netherlands
| | - Philip E. Bourne
- School of Data Science, University of Virginia, Charlottesville, Virginia, United States of America
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| | - Douglas Densmore
- College of Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Paul Freemont
- Department of Infectious Disease, Imperial College, London, United Kingdom
| | - Thomas E. Gorochowski
- School of Biological Sciences, University of Bristol, Bristol, United Kingdom
- BrisEngBio, University of Bristol, Bristol, United Kingdom
| | - Sarah I. Hernandez
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Nathan J. Hillson
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- US Department of Energy Agile BioFoundry, Emeryville, California, United States of America
- US Department of Energy Joint BioEnergy Institute, Emeryville, California, United States of America
| | - Connor R. King
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Michael Köpke
- LanzaTech, Skokie, Illinois, United States of America
| | - Shuyi Ma
- Center for Global Infectious Disease Research, Seattle Children’s Hospital, University of Washington Medicine, Seattle, Washington, United States of America
| | - Katie M. Miller
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Tae Seok Moon
- Department of Energy, Environmental & Chemical Engineering, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Jason H. Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, United States of America
| | - Brian Munsky
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Chris J. Myers
- Department of Electrical, Computer & Energy Engineering, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Dequina A. Nicholas
- Department of Molecular Biology & Biochemistry, University of California Irvine, Irvine, California, United States of America
| | - Samuel J. Peccoud
- Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Wen Zhou
- Department of Statistics, Colorado State University, Fort Collins, Colorado, United States of America
| | - Jean Peccoud
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| |
Collapse
|
7
|
Dillon EM, Dunne EM, Womack TM, Kouvari M, Larina E, Claytor JR, Ivkić A, Juhn M, Carmona PSM, Robson SV, Saha A, Villafaña JA, Zill ME. Challenges and directions in analytical paleobiology. PALEOBIOLOGY 2023; 49:377-393. [PMID: 37809321 PMCID: PMC7615171 DOI: 10.1017/pab.2023.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Over the last 50 years, access to new data and analytical tools has expanded the study of analytical paleobiology, contributing to innovative analyses of biodiversity dynamics over Earth's history. Despite-or even spurred by-this growing availability of resources, analytical paleobiology faces deep-rooted obstacles that stem from the need for more equitable access to data and best practices to guide analyses of the fossil record. Recent progress has been accelerated by a collective push toward more collaborative, interdisciplinary, and open science, especially by early-career researchers. Here, we survey four challenges facing analytical paleobiology from an early-career perspective: (1) accounting for biases when interpreting the fossil record; (2) integrating fossil and modern biodiversity data; (3) building data science skills; and (4) increasing data accessibility and equity. We discuss recent efforts to address each challenge, highlight persisting barriers, and identify tools that have advanced analytical work. Given the inherent linkages between these challenges, we encourage discourse across disciplines to find common solutions. We also affirm the need for systemic changes that reevaluate how we conduct and share paleobiological research.
Collapse
Affiliation(s)
- Erin M. Dillon
- Department of Ecology, Evolution, and Marine Biology, University of California, Santa Barbara, California 93106, U.S.A.; Smithsonian Tropical Research Institute, Balboa, Republic of Panama
| | - Emma M. Dunne
- GeoZentrum Nordbayern, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), 91054 Erlangen, Germany; School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
| | - Tom M. Womack
- School of Geography, Environment and Earth Sciences, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand
| | - Miranta Kouvari
- Department of Earth Sciences, University College London, Gower Street, London WC1E 6BT, United Kingdom; Life Sciences Department, Natural History Museum, Cromwell Road, London SW7 5BD, United Kingdom
| | - Ekaterina Larina
- Jackson School of Geosciences, University of Texas, Austin, Texas 78712, U.S.A
| | - Jordan Ray Claytor
- Department of Biology, University of Washington, Seattle, Washington 98195, U.S.A; Burke Museum of Natural History and Culture, Seattle, Washington 98195, U.S.A
| | - Angelina Ivkić
- Department of Palaeontology, University of Vienna, Josef-Holaubek-Platz 2,1090 Vienna, Austria
| | - Mark Juhn
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California 90095, U.S.A
| | - Pablo S. Milla Carmona
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Ciencias Geológicas, Buenos Aires C1428EGA, Argentina; Instituto de Estudios Andinos “Don Pablo Groeber” (IDEAN, UBA-CONICET), Buenos Aires C1428EGA, Argentina
| | - Selina Viktor Robson
- Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Anwesha Saha
- Institute of Palaeobiology, Polish Academy of Sciences, ul. Twarda 51/55, 00-818 Warsaw, Poland; Laboratory of Paleogenetics and Conservation Genetics, Centre of New Technologies (CeNT), University of Warsaw, S. Banacha 2c, 02-097 Warsaw, Poland
| | - Jaime A. Villafaña
- Department of Palaeontology, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria; Centro de Investigación en Recursos Naturales y Sustentabilidad, Universidad Bernardo O ‘Higgins, Santiago 8370993, Chile
| | - Michelle E. Zill
- Department of Earth and Planetary Sciences, University of California Riverside, Riverside, California 92521, U.S.A
| |
Collapse
|
8
|
Kalantari A, Szczepanik M, Heunis S, Mönch C, Hanke M, Wachtler T, Aswendt M. How to establish and maintain a multimodal animal research dataset using DataLad. Sci Data 2023; 10:357. [PMID: 37277500 DOI: 10.1038/s41597-023-02242-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/15/2023] [Indexed: 06/07/2023] Open
Abstract
Sharing of data, processing tools, and workflows require open data hosting services and management tools. Despite FAIR guidelines and the increasing demand from funding agencies and publishers, only a few animal studies share all experimental data and processing tools. We present a step-by-step protocol to perform version control and remote collaboration for large multimodal datasets. A data management plan was introduced to ensure data security in addition to a homogeneous file and folder structure. Changes to the data were automatically tracked using DataLad and all data was shared on the research data platform GIN. This simple and cost-effective workflow facilitates the adoption of FAIR data logistics and processing workflows by making the raw and processed data available and providing the technical infrastructure to independently reproduce the data processing steps. It enables the community to collect heterogeneously acquired and stored datasets not limited to a specific category of data and serves as a technical infrastructure blueprint with rich potential to improve data handling at other sites and extend to other research areas.
Collapse
Affiliation(s)
- Aref Kalantari
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Neurology, Cologne, Germany
| | - Michał Szczepanik
- Psychoinformatics Lab, Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
| | - Stephan Heunis
- Psychoinformatics Lab, Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
| | - Christian Mönch
- Psychoinformatics Lab, Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
| | - Michael Hanke
- Psychoinformatics Lab, Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Thomas Wachtler
- Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, München, Germany
| | - Markus Aswendt
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Neurology, Cologne, Germany.
- Cognitive Neuroscience, Institute of Neuroscience and Medicine (INM-3), Research Centre Jülich, Jülich, Germany.
| |
Collapse
|
9
|
Kosterlitz O, Huisman JS. Guidelines for the estimation and reporting of plasmid conjugation rates. Plasmid 2023; 126:102685. [PMID: 37121291 DOI: 10.1016/j.plasmid.2023.102685] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 04/15/2023] [Accepted: 04/27/2023] [Indexed: 05/02/2023]
Abstract
Conjugation is a central characteristic of plasmid biology and an important mechanism of horizontal gene transfer in bacteria. However, there is little consensus on how to accurately estimate and report plasmid conjugation rates, in part due to the wide range of available methods. Given the similarity between approaches, we propose general reporting guidelines for plasmid conjugation experiments. These constitute best practices based on recent literature about plasmid conjugation and methods to measure conjugation rates. In addition to the general guidelines, we discuss common theoretical assumptions underlying existing methods to estimate conjugation rates and provide recommendations on how to avoid violating these assumptions. We hope this will aid the implementation and evaluation of conjugation rate measurements, and initiate a broader discussion regarding the practice of quantifying plasmid conjugation rates.
Collapse
Affiliation(s)
- Olivia Kosterlitz
- Department of Biology, University of Washington, 3747 W Stevens Way NE, Life Sciences Bldg, Seattle, Washington, United States of America.
| | - Jana S Huisman
- Institute of Integrative Biology, Department of Environmental Systems Science, ETH Zurich, Universitätsstrasse 16, 8092 Zürich, Switzerland.
| |
Collapse
|
10
|
A literature review on digital content management: trends and future challenges. DIGITAL LIBRARY PERSPECTIVES 2023. [DOI: 10.1108/dlp-03-2022-0024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Purpose
The purpose of this study is to review the literature on digital content management (DCM) published between 2001 and 2021, as well as to provide insights and research directions for the future.
Design/methodology/approach
This study followed the systematic literature review framework PRISMA for reviewing existing literature on DCM. The PRISMA checklist helps the researcher in refining the reporting of the review paper. Data was collected from Scopus and Web of Science (WoS) databases. A total of 136 documents were selected for analysis from Scopus and WoS.
Findings
Based on current papers, this study attempted to discuss some key DCM trends and themes. Seven themes have been identified in the literature: virtual reality and its implications on DCM; personal DCM; microservices based DCM; model for DCM; DCM using Bluetooth Low Emergency technology; DCM software; and DCM codification. This study identifies influential authors, top contributing countries, top contributing institutions, most cited papers, most common title words and contributions by fields.
Originality/value
The findings of this study, as well as future research projects, open the path for more research and contributions to the field.
Collapse
|
11
|
Oza VH, Whitlock JH, Wilk EJ, Uno-Antonison A, Wilk B, Gajapathy M, Howton TC, Trull A, Ianov L, Worthey EA, Lasseigne BN. Ten simple rules for using public biological data for your research. PLoS Comput Biol 2023; 19:e1010749. [PMID: 36602970 DOI: 10.1371/journal.pcbi.1010749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
With an increasing amount of biological data available publicly, there is a need for a guide on how to successfully download and use this data. The 10 simple rules for using public biological data are: (1) use public data purposefully in your research; (2) evaluate data for your use case; (3) check data reuse requirements and embargoes; (4) be aware of ethics for data reuse; (5) plan for data storage and compute requirements; (6) know what you are downloading; (7) download programmatically and verify integrity; (8) properly cite data; (9) make reprocessed data and models Findable, Accessible, Interoperable, and Reusable (FAIR) and share; and (10) make pipelines and code FAIR and share. These rules are intended as a guide for researchers wanting to make use of available data and to increase data reuse and reproducibility.
Collapse
Affiliation(s)
- Vishal H Oza
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Jordan H Whitlock
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Elizabeth J Wilk
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Angelina Uno-Antonison
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Brandon Wilk
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Manavalan Gajapathy
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Timothy C Howton
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Austyn Trull
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Lara Ianov
- Civitan International Research Center, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Elizabeth A Worthey
- Center for Computational Genomics and Data Sciences, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pediatrics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
- Department of Pathology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Brittany N Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| |
Collapse
|
12
|
Wood-Charlson EM, Crockett Z, Erdmann C, Arkin AP, Robinson CB. Ten simple rules for getting and giving credit for data. PLoS Comput Biol 2022; 18:e1010476. [PMID: 36173960 PMCID: PMC9521804 DOI: 10.1371/journal.pcbi.1010476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Affiliation(s)
- Elisha M. Wood-Charlson
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- * E-mail:
| | - Zachary Crockett
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, United States of America
| | - Chris Erdmann
- American Geophysical Union, Washington, DC, United States of America
| | - Adam P. Arkin
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Carly B. Robinson
- U.S. Department of Energy Office of Scientific and Technical Information, Oak Ridge, Tennessee, United States of America
| |
Collapse
|
13
|
Dewidar O, Elmestekawy N, Welch V. Improving equity, diversity, and inclusion in academia. Res Integr Peer Rev 2022; 7:4. [PMID: 35786782 PMCID: PMC9251949 DOI: 10.1186/s41073-022-00123-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 05/26/2022] [Indexed: 01/10/2023] Open
Abstract
There are growing bodies of evidence demonstrating the benefits of equity, diversity, and inclusion (EDI) on academic and organizational excellence. In turn, some editors have stated their desire to improve the EDI of their journals and of the wider scientific community. The Royal Society of Chemistry established a minimum set of requirements aimed at improving EDI in scholarly publishing. Additionally, several resources were reported to have the potential to improve EDI, but their effectiveness and feasibility are yet to be determined. In this commentary we suggest six approaches, based on the Royal Society of Chemistry set of requirements, that journals could implement to improve EDI. They are: (1) adopt a journal EDI statement with clear, actionable steps to achieve it; (2) promote the use of inclusive and bias-free language; (3) appoint a journal’s EDI director or lead; (4) establish a EDI mentoring approach; (5) monitor adherence to EDI principles; and (6) publish reports on EDI actions and achievements. We also provide examples of journals that have implemented some of these strategies, and discuss the roles of peer reviewers, authors, researchers, academic institutes, and funders in improving EDI.
Collapse
Affiliation(s)
- Omar Dewidar
- Bruyere Research Institute, University of Ottawa, Ottawa, ON, Canada.
| | - Nour Elmestekawy
- Bruyere Research Institute, University of Ottawa, Ottawa, ON, Canada.,Faculty of Health Sciences, University of Ottawa, Ottawa, ON, Canada
| | - Vivian Welch
- Bruyere Research Institute, University of Ottawa, Ottawa, ON, Canada.,School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
14
|
Cuny AP, Schlottmann FP, Ewald JC, Pelet S, Schmoller KM. Live cell microscopy: From image to insight. BIOPHYSICS REVIEWS 2022; 3:021302. [PMID: 38505412 PMCID: PMC10903399 DOI: 10.1063/5.0082799] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 03/18/2022] [Indexed: 03/21/2024]
Abstract
Live-cell microscopy is a powerful tool that can reveal cellular behavior as well as the underlying molecular processes. A key advantage of microscopy is that by visualizing biological processes, it can provide direct insights. Nevertheless, live-cell imaging can be technically challenging and prone to artifacts. For a successful experiment, many careful decisions are required at all steps from hardware selection to downstream image analysis. Facing these questions can be particularly intimidating due to the requirement for expertise in multiple disciplines, ranging from optics, biophysics, and programming to cell biology. In this review, we aim to summarize the key points that need to be considered when setting up and analyzing a live-cell imaging experiment. While we put a particular focus on yeast, many of the concepts discussed are applicable also to other organisms. In addition, we discuss reporting and data sharing strategies that we think are critical to improve reproducibility in the field.
Collapse
Affiliation(s)
| | - Fabian P. Schlottmann
- Interfaculty Institute of Cell Biology, University of Tuebingen, 72076 Tuebingen, Germany
| | - Jennifer C. Ewald
- Interfaculty Institute of Cell Biology, University of Tuebingen, 72076 Tuebingen, Germany
| | - Serge Pelet
- Department of Fundamental Microbiology, University of Lausanne, 1015 Lausanne, Switzerland
| | | |
Collapse
|
15
|
Zitomer RA, Karr J, Kerstens M, Perry L, Ruth K, Adrean L, Austin S, Cornelius J, Dachenhaus J, Dinkins J, Harrington A, Kim H, Owens T, Revekant C, Schroeder V, Sink C, Valente JJ, Woodis E, Rivers JW. Ten simple rules for getting started with statistics in graduate school. PLoS Comput Biol 2022; 18:e1010033. [PMID: 35446846 PMCID: PMC9022819 DOI: 10.1371/journal.pcbi.1010033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Rachel A. Zitomer
- Department of Forest Ecosystems and Society, Oregon State University, Corvallis, Oregon, United States of America
| | - Jessica Karr
- Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America
| | - Mark Kerstens
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
| | - Lindsey Perry
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Kayla Ruth
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Lindsay Adrean
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
| | - Suzanne Austin
- Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America
| | - Jamie Cornelius
- Department of Integrative Biology, Oregon State University, Corvallis, Oregon, United States of America
| | - Jonathan Dachenhaus
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
| | - Jonathan Dinkins
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Alan Harrington
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Hankyu Kim
- Department of Forest Ecosystems and Society, Oregon State University, Corvallis, Oregon, United States of America
| | - Terrah Owens
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Claire Revekant
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Vanessa Schroeder
- Department of Animal and Rangeland Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Chelsea Sink
- Department of Fisheries, Wildlife, and Conservation Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Jonathon J. Valente
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
- Smithsonian Conservation Biology Institute, Migratory Bird Center, National Zoological Park, Washington, DC, United States of America
| | - Ethan Woodis
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
| | - James W. Rivers
- Department of Forest Engineering, Resources, and Management, Oregon State University, Corvallis, Oregon, United States of America
- * E-mail:
| |
Collapse
|
16
|
Mani V, Kavitha C, Band SS, Mosavi A, Hollins P, Palanisamy S. A Recommendation System Based on AI for Storing Block Data in the Electronic Health Repository. Front Public Health 2022; 9:831404. [PMID: 35127632 PMCID: PMC8814315 DOI: 10.3389/fpubh.2021.831404] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 12/20/2021] [Indexed: 11/16/2022] Open
Abstract
The proliferation of wearable sensors that record physiological signals has resulted in an exponential growth of data on digital health. To select the appropriate repository for the increasing amount of collected data, intelligent procedures are becoming increasingly necessary. However, allocating storage space is a nuanced process. Generally, patients have some input in choosing which repository to use, although they are not always responsible for this decision. Patients are likely to have idiosyncratic storage preferences based on their unique circumstances. The purpose of the current study is to develop a new predictive model of health data storage to meet the needs of patients while ensuring rapid storage decisions, even when data is streaming from wearable devices. To create the machine learning classifier, we used a training set synthesized from small samples of experts who exhibited correlations between health data and storage features. The results confirm the validity of the machine learning methodology.
Collapse
Affiliation(s)
- Vinodhini Mani
- Department of Computer Science and Engineering, School of Computing, Sathyabama Institute of Science and Technology, Chennai, India
- *Correspondence: Vinodhini Mani
| | - C. Kavitha
- Department of Computer Science and Engineering, School of Computing, Sathyabama Institute of Science and Technology, Chennai, India
| | - Shahab S. Band
- Future Technology Research Center, College of Future, National Yunlin University of Science and Technology, Yunlin, Taiwan
- Shahab S. Band
| | - Amir Mosavi
- Faculty of Civil Engineering, TU-Dresden, Dresden, Germany
- Institute of Information Society, University of Public Service, Budapest, Hungary
- John von Neumann Faculty of Informatics, Obuda University, Budapest, Hungary
- Amir Mosavi
| | - Paul Hollins
- Cultural Research Development School of Arts, Institute of Management, University of Bolton, Bolton, United Kingdom
| | | |
Collapse
|
17
|
Trunschke A. Prospects and challenges for autonomous catalyst discovery viewed from an experimental perspective. Catal Sci Technol 2022. [DOI: 10.1039/d2cy00275b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Autonomous catalysis research requires elaborate integration of operando experiments into automated workflows. Suitable experimental data for analysis by artificial intelligence can be measured more readily according to standard operating procedures.
Collapse
Affiliation(s)
- Annette Trunschke
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Department of Inorganic Chemistry, Faradayweg 4-6, 14195 Berlin, Germany
| |
Collapse
|
18
|
Tierney NJ, Ram K. Common-sense approaches to sharing tabular data alongside publication. PATTERNS (NEW YORK, N.Y.) 2021; 2:100368. [PMID: 34950899 PMCID: PMC8672137 DOI: 10.1016/j.patter.2021.100368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Numerous arguments strongly support the practice of open science, which offers several societal and individual benefits. For individual researchers, sharing research artifacts such as data can increase trust and transparency, improve the reproducibility of one's own work, and catalyze new collaborations. Despite a general appreciation of the benefits of data sharing, research data are often only available to the original investigators. For data that are shared, lack of useful metadata and documentation make them challenging to reuse. In this paper, we argue that a lack of incentives and infrastructure for making data useful is the biggest barrier to creating a culture of widespread data sharing. We compare data with code, examine computational environments in the context of their ability to facilitate the reproducibility of research, provide some practical guidance on how one can improve the chances of their data being reusable, and partially bridge the incentive gap. While previous papers have focused on describing ideal best practices for data and code, we focus on common-sense ideas for sharing tabular data for a target audience of academics working in data science adjacent fields who are about to submit for publication.
Collapse
Affiliation(s)
- Nicholas J Tierney
- Monash University, Department of Econometrics and Business Statistics, Melbourne, Australia
- Australian Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS), Melbourne, Australia
- Telethon Kids Institute, Perth Children's Hospital, Perth, Australia
| | - Karthik Ram
- Berkeley Institute for Data Science, University of California, Berkeley, USA
| |
Collapse
|
19
|
Thompson RM, Hall J, Morrison C, Palmer NR, Roberts DL. Ethics and governance for internet-based conservation science research. CONSERVATION BIOLOGY : THE JOURNAL OF THE SOCIETY FOR CONSERVATION BIOLOGY 2021; 35:1747-1754. [PMID: 34057267 DOI: 10.1111/cobi.13778] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 04/23/2021] [Accepted: 05/19/2021] [Indexed: 06/12/2023]
Abstract
Internet-based research is increasingly important for conservation science and has wide-ranging applications and contexts, including culturomics, illegal wildlife trade, and citizen science. However, online research methods pose a range of ethical and legal challenges. Online data may be protected by copyright, database rights, or contract law. Privacy rights may also restrict the use and access of data, as well as ethical requirements from institutions. Online data have real-world meaning, and the ethical treatment of individuals and communities must not be marginalized when conducting internet-based research. As ethics frameworks originally developed for biomedical applications are inadequate for these methods, we propose that research activities involving the analysis of preexisting online data be treated analogous to offline social science methods, in particular, nondeceptive covert observation. By treating internet users and their data with respect and due consideration, conservationists can uphold the public trust needed to effectively address real-world issues.
Collapse
Affiliation(s)
- Ruth M Thompson
- Durrell Institute of Conservation and Ecology, School of Anthropology and Conservation, University of Kent, Canterbury, Kent, UK
| | - Jordan Hall
- Information Compliance Office, Darwin College, University of Kent, Canterbury, Kent, UK
| | - Chris Morrison
- Copyright, Licensing & Policy, Information Services, Templeman Library, University of Kent, Canterbury, Kent, UK
| | - Nicole R Palmer
- Research Ethics and Governance, Research Services, The Registry, University of Kent, Canterbury, Kent, UK
| | - David L Roberts
- Durrell Institute of Conservation and Ecology, School of Anthropology and Conservation, University of Kent, Canterbury, Kent, UK
- Department of Zoology, University of Oxford, Oxford, UK
- Oxford Martin School, University of Oxford, Oxford, UK
| |
Collapse
|
20
|
Sanchez R, Griffin BA, Pane J, McCaffrey DF. Best practices in statistical computing. Stat Med 2021; 40:6057-6068. [PMID: 34486156 PMCID: PMC9662695 DOI: 10.1002/sim.9169] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 08/03/2021] [Accepted: 08/05/2021] [Indexed: 12/11/2022]
Abstract
The world is becoming increasingly complex, both in terms of the rich sources of data we have access to and the statistical and computational methods we can use on data. These factors create an ever-increasing risk for errors in code and the sensitivity of findings to data preparation and the execution of complex statistical and computing methods. The consequences of coding and data mistakes can be substantial. In this paper, we describe the key steps for implementing a code quality assurance (QA) process that researchers can follow to improve their coding practices throughout a project to assure the quality of the final data, code, analyses, and results. These steps include: (i) adherence to principles for code writing and style that follow best practices; (ii) clear written documentation that describes code, workflow, and key analytic decisions; (iii) careful version control; (iv) good data management; and (v) regular testing and review. Following these steps will greatly improve the ability of a study to assure results are accurate and reproducible. The responsibility for code QA falls not only on individual researchers but institutions, journals, and funding agencies as well.
Collapse
Affiliation(s)
| | | | - Joseph Pane
- RAND Corporation, Pittsburgh, Pennsylvania, USA
| | | |
Collapse
|
21
|
|
22
|
Samuel S, König-Ries B. Understanding experiments and research practices for reproducibility: an exploratory study. PeerJ 2021; 9:e11140. [PMID: 33976964 PMCID: PMC8067906 DOI: 10.7717/peerj.11140] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 03/01/2021] [Indexed: 11/20/2022] Open
Abstract
Scientific experiments and research practices vary across disciplines. The research practices followed by scientists in each domain play an essential role in the understandability and reproducibility of results. The "Reproducibility Crisis", where researchers find difficulty in reproducing published results, is currently faced by several disciplines. To understand the underlying problem in the context of the reproducibility crisis, it is important to first know the different research practices followed in their domain and the factors that hinder reproducibility. We performed an exploratory study by conducting a survey addressed to researchers representing a range of disciplines to understand scientific experiments and research practices for reproducibility. The survey findings identify a reproducibility crisis and a strong need for sharing data, code, methods, steps, and negative and positive results. Insufficient metadata, lack of publicly available data, and incomplete information in study methods are considered to be the main reasons for poor reproducibility. The survey results also address a wide number of research questions on the reproducibility of scientific results. Based on the results of our explorative study and supported by the existing published literature, we offer general recommendations that could help the scientific community to understand, reproduce, and reuse experimental data and results in the research data lifecycle.
Collapse
Affiliation(s)
- Sheeba Samuel
- Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena, Thuringia, Germany
- Michael Stifel Center Jena, Jena, Thuringia, Germany
| | - Birgitta König-Ries
- Heinz Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena, Thuringia, Germany
- Michael Stifel Center Jena, Jena, Thuringia, Germany
| |
Collapse
|
23
|
Löffler F, Wesp V, König-Ries B, Klan F. Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs? PLoS One 2021; 16:e0246099. [PMID: 33760822 PMCID: PMC7990268 DOI: 10.1371/journal.pone.0246099] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 01/13/2021] [Indexed: 11/19/2022] Open
Abstract
The increasing amount of publicly available research data provides the opportunity to link and integrate data in order to create and prove novel hypotheses, to repeat experiments or to compare recent data to data collected at a different time or place. However, recent studies have shown that retrieving relevant data for data reuse is a time-consuming task in daily research practice. In this study, we explore what hampers dataset retrieval in biodiversity research, a field that produces a large amount of heterogeneous data. In particular, we focus on scholarly search interests and metadata, the primary source of data in a dataset retrieval system. We show that existing metadata currently poorly reflect information needs and therefore are the biggest obstacle in retrieving relevant data. Our findings indicate that for data seekers in the biodiversity domain environments, materials and chemicals, species, biological and chemical processes, locations, data parameters and data types are important information categories. These interests are well covered in metadata elements of domain-specific standards. However, instead of utilizing these standards, large data repositories tend to use metadata standards with domain-independent metadata fields that cover search interests only to some extent. A second problem are arbitrary keywords utilized in descriptive fields such as title, description or subject. Keywords support scholars in a full text search only if the provided terms syntactically match or their semantic relationship to terms used in a user query is known.
Collapse
Affiliation(s)
- Felicitas Löffler
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany
| | - Valentin Wesp
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany
| | - Birgitta König-Ries
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany
- Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, Germany
- German Center for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Friederike Klan
- Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, Germany
- Citizen Science Group, DLR-Institute of Data Science, German Aerospace Center, Jena, Germany
| |
Collapse
|
24
|
Caufield JH, Fu J, Wang D, Guevara-Gonzalez V, Wang W, Ping P. A Second Look at FAIR in Proteomic Investigations. J Proteome Res 2021; 20:2182-2186. [PMID: 33719446 DOI: 10.1021/acs.jproteome.1c00177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Proteomics is, by definition, comprehensive and large-scale, seeking to unravel ome-level protein features with phenotypic information on an entire system, an organ, cells, or organisms. This scope consistently involves and extends beyond single experiments. Multitudinous resources now exist to assist in making the results of proteomics experiments more findable, accessible, interoperable, and reusable (FAIR), yet many tools are awaiting to be adopted by our community. Here we highlight strategies for expanding the impact of proteomics data beyond single studies. We show how linking specific terminologies, identifiers, and text (words) can unify individual data points across a wide spectrum of studies and, more importantly, how this approach may potentially reveal novel relationships. In this effort, we explain how data sets and methods can be rendered more linkable and how this maximizes their value. We also include a discussion on how data linking strategies benefit stakeholders across the proteomics community and beyond.
Collapse
|
25
|
Fer I, Gardella AK, Shiklomanov AN, Campbell EE, Cowdery EM, De Kauwe MG, Desai A, Duveneck MJ, Fisher JB, Haynes KD, Hoffman FM, Johnston MR, Kooper R, LeBauer DS, Mantooth J, Parton WJ, Poulter B, Quaife T, Raiho A, Schaefer K, Serbin SP, Simkins J, Wilcox KR, Viskari T, Dietze MC. Beyond ecosystem modeling: A roadmap to community cyberinfrastructure for ecological data-model integration. GLOBAL CHANGE BIOLOGY 2021; 27:13-26. [PMID: 33075199 PMCID: PMC7756391 DOI: 10.1111/gcb.15409] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 09/16/2020] [Indexed: 05/10/2023]
Abstract
In an era of rapid global change, our ability to understand and predict Earth's natural systems is lagging behind our ability to monitor and measure changes in the biosphere. Bottlenecks to informing models with observations have reduced our capacity to fully exploit the growing volume and variety of available data. Here, we take a critical look at the information infrastructure that connects ecosystem modeling and measurement efforts, and propose a roadmap to community cyberinfrastructure development that can reduce the divisions between empirical research and modeling and accelerate the pace of discovery. A new era of data-model integration requires investment in accessible, scalable, and transparent tools that integrate the expertise of the whole community, including both modelers and empiricists. This roadmap focuses on five key opportunities for community tools: the underlying foundations of community cyberinfrastructure; data ingest; calibration of models to data; model-data benchmarking; and data assimilation and ecological forecasting. This community-driven approach is a key to meeting the pressing needs of science and society in the 21st century.
Collapse
Affiliation(s)
- Istem Fer
- Finnish Meteorological InstituteHelsinkiFinland
| | - Anthony K. Gardella
- Department of Earth and EnvironmentBoston UniversityBostonMAUSA
- School for Environment and SustainabilityUniversity of MichiganAnn ArborMIUSA
| | | | | | | | - Martin G. De Kauwe
- ARC Centre of Excellence for Climate ExtremesSydneyNSWAustralia
- Climate Change Research CentreUniversity of New South WalesSydneyNSWAustralia
- Evolution & Ecology Research CentreUniversity of New South WalesSydneyNSWAustralia
| | - Ankur Desai
- Department of Atmospheric and Oceanic SciencesUniversity of Wisconsin‐MadisonMadisonWIUSA
| | | | - Joshua B. Fisher
- Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadenaCAUSA
| | | | - Forrest M. Hoffman
- Computational Earth Sciences Group and Climate Change Science InstituteOak Ridge National LaboratoryOak RidgeTNUSA
- Department of Civil and Environmental EngineeringUniversity of TennesseeKnoxvilleTNUSA
| | - Miriam R. Johnston
- Department of Organismic and Evolutionary BiologyHarvard UniversityCambridgeMAUSA
| | - Rob Kooper
- NCSA (National Center for Supercomputing Applications)University of Illinois at Urbana ChampaignUrbanaILUSA
| | - David S. LeBauer
- College of Agriculture and Life SciencesUniversity of ArizonaTucsonAZUSA
| | | | - William J. Parton
- Natural Resource Ecology LaboratoryColorado State UniversityFort CollinsCOUSA
| | - Benjamin Poulter
- Biospheric Sciences Laboratory (618)NASA Goddard Space Flight CenterGreenbeltMDUSA
| | - Tristan Quaife
- UK National Centre for Earth Observation and Department of MeteorologyUniversity of ReadingReadingUK
| | - Ann Raiho
- Fish, Wildlife, and Conservation Biology DepartmentColorado State UniversityFort CollinsCOUSA
| | - Kevin Schaefer
- National Snow and Ice Data CenterCooperative Institute for Research in Environmental SciencesUniversity of ColoradoBoulderCOUSA
| | - Shawn P. Serbin
- Brookhaven National LaboratoryEnvironmental and Climate Sciences DepartmentUptonNYUSA
| | | | - Kevin R. Wilcox
- Ecosystem Science and ManagementUniversity of WyomingLaramieWYUSA
| | | | | |
Collapse
|
26
|
Jung H, Ventura T, Chung JS, Kim WJ, Nam BH, Kong HJ, Kim YO, Jeon MS, Eyun SI. Twelve quick steps for genome assembly and annotation in the classroom. PLoS Comput Biol 2020; 16:e1008325. [PMID: 33180771 PMCID: PMC7660529 DOI: 10.1371/journal.pcbi.1008325] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.
Collapse
Affiliation(s)
- Hyungtaek Jung
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Centre for Agriculture and Bioeconomy, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Tomer Ventura
- Genecology Research Centre, School of Science and Engineering, University of the Sunshine Coast, Sippy Downs, Queensland, Australia
| | - J. Sook Chung
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, Maryland, United States of America
| | - Woo-Jin Kim
- Genetics and Breeding Research Center, National Institute of Fisheries Science, Geoje, Korea
| | - Bo-Hye Nam
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Hee Jeong Kong
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Young-Ok Kim
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Seong-il Eyun
- Department of Life Science, Chung-Ang University, Seoul, Korea
| |
Collapse
|
27
|
Bernard R, Weissgerber T, Bobrov E, Winham S, Dirnagl U, Riedel N. fiddle: a tool to combat publication bias by getting research out of the file drawer and into the scientific community. Clin Sci (Lond) 2020; 134:2729-2739. [PMID: 33111948 PMCID: PMC7593522 DOI: 10.1042/cs20201125] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 09/30/2020] [Accepted: 10/02/2020] [Indexed: 01/10/2023]
Abstract
Statistically significant findings are more likely to be published than non-significant or null findings, leaving scientists and healthcare personnel to make decisions based on distorted scientific evidence. Continuously expanding ´file drawers' of unpublished data from well-designed experiments waste resources creates problems for researchers, the scientific community and the public. There is limited awareness of the negative impact that publication bias and selective reporting have on the scientific literature. Alternative publication formats have recently been introduced that make it easier to publish research that is difficult to publish in traditional peer reviewed journals. These include micropublications, data repositories, data journals, preprints, publishing platforms, and journals focusing on null or neutral results. While these alternative formats have the potential to reduce publication bias, many scientists are unaware that these formats exist and don't know how to use them. Our open source file drawer data liberation effort (fiddle) tool (RRID:SCR_017327 available at: http://s-quest.bihealth.org/fiddle/) is a match-making Shiny app designed to help biomedical researchers to identify the most appropriate publication format for their data. Users can search for a publication format that meets their needs, compare and contrast different publication formats, and find links to publishing platforms. This tool will assist scientists in getting otherwise inaccessible, hidden data out of the file drawer into the scientific community and literature. We briefly highlight essential details that should be included to ensure reporting quality, which will allow others to use and benefit from research published in these new formats.
Collapse
Affiliation(s)
- René Bernard
- NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
- QUEST Center for Transforming Biomedical Research, Berlin Institute of Health (BIH), Berlin, Germany
| | - Tracey L. Weissgerber
- QUEST Center for Transforming Biomedical Research, Berlin Institute of Health (BIH), Berlin, Germany
| | - Evgeny Bobrov
- QUEST Center for Transforming Biomedical Research, Berlin Institute of Health (BIH), Berlin, Germany
| | - Stacey J. Winham
- Division of Biomedical Statistics and Informatics, Mayo Clinic Rochester, MN, U.S.A
| | - Ulrich Dirnagl
- NeuroCure Cluster of Excellence, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
- QUEST Center for Transforming Biomedical Research, Berlin Institute of Health (BIH), Berlin, Germany
| | - Nico Riedel
- QUEST Center for Transforming Biomedical Research, Berlin Institute of Health (BIH), Berlin, Germany
| |
Collapse
|
28
|
Uddin MA, Stranieri A, Gondal I, Balasubramanian V. Rapid health data repository allocation using predictive machine learning. Health Informatics J 2020; 26:3009-3036. [PMID: 32969296 DOI: 10.1177/1460458220957486] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Health-related data is stored in a number of repositories that are managed and controlled by different entities. For instance, Electronic Health Records are usually administered by governments. Electronic Medical Records are typically controlled by health care providers, whereas Personal Health Records are managed directly by patients. Recently, Blockchain-based health record systems largely regulated by technology have emerged as another type of repository. Repositories for storing health data differ from one another based on cost, level of security and quality of performance. Not only has the type of repositories increased in recent years, but the quantum of health data to be stored has increased. For instance, the advent of wearable sensors that capture physiological signs has resulted in an exponential growth in digital health data. The increase in the types of repository and amount of data has driven a need for intelligent processes to select appropriate repositories as data is collected. However, the storage allocation decision is complex and nuanced. The challenges are exacerbated when health data are continuously streamed, as is the case with wearable sensors. Although patients are not always solely responsible for determining which repository should be used, they typically have some input into this decision. Patients can be expected to have idiosyncratic preferences regarding storage decisions depending on their unique contexts. In this paper, we propose a predictive model for the storage of health data that can meet patient needs and make storage decisions rapidly, in real-time, even with data streaming from wearable sensors. The model is built with a machine learning classifier that learns the mapping between characteristics of health data and features of storage repositories from a training set generated synthetically from correlations evident from small samples of experts. Results from the evaluation demonstrate the viability of the machine learning technique used.
Collapse
|
29
|
Tzanova S. Changes in academic libraries in the era of Open Science. EDUCATION FOR INFORMATION 2020. [DOI: 10.3233/efi-190259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this paper we study the changes in academic library services inspired by the Open Science movement and especially the changes prompted from Open Data as a founding part of Open Science. We argue that academic libraries face the even bigger challenges for accommodating and providing support for Open Big Data composed from existing raw data sets and new massive sets generated from data driven research. Ensuring the veracity of Open Big Data is a complex problem dominated by data science. For academic libraries, that challenge triggers not only the expansion of traditional library services, but also leads to adoption of a set of new roles and responsibilities. That includes, but is not limited to development of the supporting models for Research Data Management, providing Data Management Plan assistance, expanding the qualifications of library personnel toward data science literacy, integration of the library services into research and educational process by taking part in research grants and many others. We outline several approaches taken by some academic libraries and by libraries at the City University of New York (CUNY) to meet necessities imposed by doing research and education with Open Big Data – from changes in libraries’ administrative structure, changes in personnel qualifications and duties, leading the interdisciplinary advisory groups, to active collaboration in principal projects.
Collapse
|
30
|
George CH, Alexander SPH, Cirino G, Docherty JR, Hoyer D, Insel PA, Izzo AA, Ji Y, Panettieri RA, Sobey CG, Stanford SC, Stefanska B, Stephens G, Teixeira M, Ahluwalia A. The BJP expects authors to share data. Br J Pharmacol 2020; 176:4595-4598. [PMID: 31950490 DOI: 10.1111/bph.14907] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Affiliation(s)
| | | | | | | | - Daniel Hoyer
- The University of Melbourne, Parkville, Victoria, Australia.,Florey Institute of Neuroscience and Mental Health, Parkville, Victoria, Australia.,The Scripps Research Institute, La Jolla, CA, USA
| | | | | | - Yong Ji
- Nanjing Medical University, Nanjing, China
| | | | | | | | - Barbara Stefanska
- The University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Mauro Teixeira
- Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Amrita Ahluwalia
- William Harvey Research Institute, Queen Mary University of London, London, UK
| |
Collapse
|
31
|
Choi Y, Bae HJ, Lee AC, Choi H, Lee D, Ryu T, Hyun J, Kim S, Kim H, Song SH, Kim K, Park W, Kwon S. DNA Micro-Disks for the Management of DNA-Based Data Storage with Index and Write-Once-Read-Many (WORM) Memory Features. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2020; 32:e2001249. [PMID: 32725925 DOI: 10.1002/adma.202001249] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Revised: 06/03/2020] [Indexed: 05/25/2023]
Abstract
DNA-based data storage has attracted attention because of its higher physical density of the data and longer retention time than those of conventional digital data storage. However, previous DNA-based data storage lacked index features and the data quality of storage after a single access was not preserved, obstructing its industrial use. Here, DNA micro-disks, QR-coded micro-sized disks that harbor data-encoded DNA molecules for the efficient management of DNA-based data storage, are proposed. The two major features that previous DNA-based data-storage studies could not achieve are demonstrated. One feature is accessing data items efficiently by indexing the data-encoded DNA library. Another is achieving write-once-read-many (WORM) memory through the immobilization of DNA molecules on the disk and their enrichment through in situ DNA production. Through these features, the reliability of DNA-based data storage is increased by allowing selective and multiple accession of data-encoded DNA with lower data loss than previous DNA-based data storage methods.
Collapse
Affiliation(s)
- Yeongjae Choi
- Nano Systems Institute, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
| | - Hyung Jong Bae
- Department of Electrical and Computer Engineering, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
| | - Amos C Lee
- Interdisciplinary Program for Bioengineering, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
| | - Hansol Choi
- Department of Electrical and Computer Engineering, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
| | - Daewon Lee
- Nano Systems Institute, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
- BK21+ Creative Research Engineer Development for IT, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
| | - Taehoon Ryu
- Celemics Inc., 131, Gasandigital 1-ro, Geumcheon-gu, Seoul, 08506, Republic of Korea
| | - Jinwoo Hyun
- Department of Electrical and Computer Engineering, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
| | - Seojoo Kim
- Department of Electronic Engineering, Kyung Hee University, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do, 17104, Republic of Korea
| | - Hyeli Kim
- Department of Electronic Engineering, Kyung Hee University, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do, 17104, Republic of Korea
| | - Suk-Heung Song
- Department of Electronic Engineering, Kyung Hee University, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do, 17104, Republic of Korea
| | - Kibeom Kim
- Department of Electronic Engineering, Kyung Hee University, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do, 17104, Republic of Korea
| | - Wook Park
- Department of Electronic Engineering, Kyung Hee University, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do, 17104, Republic of Korea
- Institute for Wearable Convergence Electronics, Kyung Hee University, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do, 17104, Republic of Korea
| | - Sunghoon Kwon
- Nano Systems Institute, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
- Department of Electrical and Computer Engineering, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
- Interdisciplinary Program for Bioengineering, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
- Institute of Entrepreneurial Bio Convergence, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
- Seoul National University Hospital Biomedical Research Institute, Seoul National University Hospital, 101, Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea
- Inter-University Semiconductor Research Center (ISRC), Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Republic of Korea
| |
Collapse
|
32
|
Roche DG, Granados M, Austin CC, Wilson S, Mitchell GM, Smith PA, Cooke SJ, Bennett JR. Open government data and environmental science: a federal Canadian perspective. Facets (Ott) 2020. [DOI: 10.1139/facets-2020-0008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Governments worldwide are releasing data into the public domain via open government data initiatives. Many such data sets are directly relevant to environmental science and complement data collected by academic researchers to address complex and challenging environmental problems. The Government of Canada is a leader in open data among Organisation for Economic Co-operation and Development countries, generating and releasing troves of valuable research data. However, achieving comprehensive and FAIR (findable, accessible, interoperable, reusable) open government data is not without its challenges. For example, identifying and understanding Canada’s international commitments, policies, and guidelines on open data can be daunting. Similarly, open data sets within the Government of Canada are spread across a diversity of repositories and portals, which may hinder their discoverability. We describe Canada’s federal initiatives promoting open government data, and outline where data sets of relevance to environmental science can be found. We summarize research data management challenges identified by the Government of Canada, plans to modernize the approach to open data for environmental science and best practices for data discoverability, access, and reuse.
Collapse
Affiliation(s)
- Dominique G. Roche
- Canadian Centre for Evidence-Based Conservation, Department of Biology and Institute of Environmental and Interdisciplinary Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Monica Granados
- Science and Technology Strategies Directorate, Environment and Climate Change Canada, Gatineau, QC K1A 0H3, Canada
| | - Claire C. Austin
- Science and Technology Strategies Directorate, Environment and Climate Change Canada, Gatineau, QC K1A 0H3, Canada
| | - Scott Wilson
- Canadian Wildlife Service, Environment and Climate Change Canada, Gatineau, QC K1A 0H3, Canada
| | - Gregory M. Mitchell
- Canadian Wildlife Service, Environment and Climate Change Canada, Gatineau, QC K1A 0H3, Canada
| | - Paul A. Smith
- Canadian Wildlife Service, Environment and Climate Change Canada, Gatineau, QC K1A 0H3, Canada
| | - Steven J. Cooke
- Canadian Centre for Evidence-Based Conservation, Department of Biology and Institute of Environmental and Interdisciplinary Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Joseph R. Bennett
- Canadian Centre for Evidence-Based Conservation, Department of Biology and Institute of Environmental and Interdisciplinary Sciences, Carleton University, Ottawa, ON K1S 5B6, Canada
| |
Collapse
|
33
|
Borstelmann SM. Machine Learning Principles for Radiology Investigators. Acad Radiol 2020; 27:13-25. [PMID: 31818379 DOI: 10.1016/j.acra.2019.07.030] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 07/04/2019] [Accepted: 07/11/2019] [Indexed: 12/16/2022]
Abstract
Artificial intelligence and deep learning are areas of high interest for radiology investigators at present. However, the field of machine learning encompasses multiple statistics-based techniques useful for investigators, which may be complementary to deep learning approaches. After a refresher in basic statistical concepts, relevant considerations for machine learning practitioners are reviewed: regression, classification, decision boundaries, and bias-variance tradeoff. Regularization, ground truth, and populations are discussed along with compute and data management principles. Advanced statistical machine learning techniques including bootstrapping, bagging, boosting, decision trees, random forest, XGboost, and support vector machines are reviewed along with relevant examples from the radiology literature.
Collapse
|
34
|
Guerrero S, López-Cortés A, García-Cárdenas JM, Saa P, Indacochea A, Armendáriz-Castillo I, Zambrano AK, Yumiceba V, Pérez-Villa A, Guevara-Ramírez P, Moscoso-Zea O, Paredes J, Leone PE, Paz-y-Miño C. A quick guide for using Microsoft OneNote as an electronic laboratory notebook. PLoS Comput Biol 2019; 15:e1006918. [PMID: 31071077 PMCID: PMC6508581 DOI: 10.1371/journal.pcbi.1006918] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Scientific data recording and reporting systems are of a great interest for endorsing reproducibility and transparency practices among the scientific community. Current research generates large datasets that can no longer be documented using paper lab notebooks (PLNs). In this regard, electronic laboratory notebooks (ELNs) could be a promising solution to replace PLNs and promote scientific reproducibility and transparency. We previously analyzed five ELNs and performed two survey-based studies to implement an ELN in a biomedical research institute. Among the ELNs tested, we found that Microsoft OneNote presents numerous features related to ELN best functionalities. In addition, both surveyed groups preferred OneNote over a scientifically designed ELN (PerkinElmer Elements). However, OneNote remains a general note-taking application and has not been designed for scientific purposes. We therefore provide a quick guide to adapt OneNote to an ELN workflow that can also be adjusted to other nonscientific ELNs.
Collapse
Affiliation(s)
- Santiago Guerrero
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
- * E-mail: (SG); (CPyM)
| | - Andrés López-Cortés
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
- RNASA-IMEDIR, Computer Sciences Faculty, University of Coruna, Coruna, Spain
| | - Jennyfer M. García-Cárdenas
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Pablo Saa
- Faculty of Engineering Sciences, Universidad UTE, Quito, Ecuador
| | - Alberto Indacochea
- Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Oncology and Molecular Pathology Research Group-VHIR-Vall d’ Hebron Institut de Recerca-Vall d’ Hebron Hospital, Barcelona, Spain
| | - Isaac Armendáriz-Castillo
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Ana Karina Zambrano
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Verónica Yumiceba
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Andy Pérez-Villa
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Patricia Guevara-Ramírez
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | | | - Joel Paredes
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
- Faculty of Engineering Sciences, Universidad UTE, Quito, Ecuador
| | - Paola E. Leone
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - César Paz-y-Miño
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
- * E-mail: (SG); (CPyM)
| |
Collapse
|
35
|
Tang YA, Pichler K, Füllgrabe A, Lomax J, Malone J, Munoz-Torres MC, Vasant DV, Williams E, Haendel M. Ten quick tips for biocuration. PLoS Comput Biol 2019; 15:e1006906. [PMID: 31048830 PMCID: PMC6497217 DOI: 10.1371/journal.pcbi.1006906] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Affiliation(s)
- Y. Amy Tang
- Genestack Limited, Cambridge, Cambridgeshire, United Kingdom
- * E-mail:
| | - Klemens Pichler
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, United Kingdom
| | - Anja Füllgrabe
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, United Kingdom
| | - Jane Lomax
- SciBite Limited, BioData Innovation Centre, Hinxton, Cambridgeshire, United Kingdom
| | - James Malone
- SciBite Limited, BioData Innovation Centre, Hinxton, Cambridgeshire, United Kingdom
| | | | - Drashtti V. Vasant
- Bayer Business Services GmbH, BP Research and Development, Translational Sciences, Berlin, Germany
| | - Eleanor Williams
- Centre for Gene Regulation and Expression, School of Life Sciences, University of Dundee, Dundee, United Kingdom
- Genomics England, Queen Mary University of London, London, United Kingdom
| | - Melissa Haendel
- Linus Pauling Institute, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
36
|
Olsson TS, Hartley M. Lightweight data management with dtool. PeerJ 2019; 7:e6562. [PMID: 30867992 PMCID: PMC6409086 DOI: 10.7717/peerj.6562] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Accepted: 02/04/2019] [Indexed: 11/29/2022] Open
Abstract
The explosion in volumes and types of data has led to substantial challenges in data management. These challenges are often faced by front-line researchers who are already dealing with rapidly changing technologies and have limited time to devote to data management. There are good high-level guidelines for managing and processing scientific data. However, there is a lack of simple, practical tools to implement these guidelines. This is particularly problematic in a highly distributed research environment where needs differ substantially from group to group and centralised solutions are difficult to implement and storage technologies change rapidly. To meet these challenges we have developed dtool, a command line tool for managing data. The tool packages data and metadata into a unified whole, which we call a dataset. The dataset provides consistency checking and the ability to access metadata for both the whole dataset and individual files. The tool can store these datasets on several different storage systems, including a traditional file system, object store (S3 and Azure) and iRODS. It includes an application programming interface that can be used to incorporate it into existing pipelines and workflows. The tool has provided substantial process, cost, and peace-of-mind benefits to our data management practices and we want to share these benefits. The tool is open source and available freely online at http://dtool.readthedocs.io.
Collapse
Affiliation(s)
- Tjelvar S.G. Olsson
- Computational Systems Biology, John Innes Centre, Norwich, UK, United Kingdom
| | - Matthew Hartley
- Computational Systems Biology, John Innes Centre, Norwich, UK, United Kingdom
| |
Collapse
|
37
|
Bartomeus I, Stavert JR, Ward D, Aguado O. Historical collections as a tool for assessing the global pollination crisis. Philos Trans R Soc Lond B Biol Sci 2018; 374:rstb.2017.0389. [PMID: 30455207 DOI: 10.1098/rstb.2017.0389] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/19/2018] [Indexed: 11/12/2022] Open
Abstract
There is increasing concern about the decline of pollinators worldwide. However, despite reports that pollinator declines are widespread, data are scarce and often geographically and taxonomically biased. These biases limit robust inference about any potential pollinator crisis. Non-structured and opportunistic historical specimen collection data provide the only source of historical information which can serve as a baseline for identifying pollinator declines. Specimens historically collected and preserved in museums not only provide information on where and when species were collected, but also contain other ecological information such as species interactions and morphological traits. Here, we provide a synthesis of how researchers have used historical data to identify long-term changes in biodiversity, species abundances, morphology and pollination services. Despite recent advances, we show that information on the status and trends of most pollinators is absent. We highlight opportunities and limitations to progress the assessment of pollinator declines globally. Finally, we demonstrate different approaches to analysing museum collection data using two contrasting case studies from distinct geographical regions (New Zealand and Spain) for which long-term pollinator declines have never been assessed. There is immense potential for museum specimens to play a central role in assessing the extent of the global pollination crisis.This article is part of the theme issue 'Biological collections for understanding biodiversity in the Anthropocene'.
Collapse
Affiliation(s)
- I Bartomeus
- Estación Biológica de Doñana (EBD-CSIC), Avda. Américo Vespucio 26, Isla de la Cartuja, 41092 Sevilla, Spain
| | - J R Stavert
- Centre for Biodiversity and Biosecurity, School of Biological Sciences, The University of Auckland, Auckland, New Zealand
| | - D Ward
- Centre for Biodiversity and Biosecurity, School of Biological Sciences, The University of Auckland, Auckland, New Zealand.,Landcare Research, Auckland, New Zealand
| | - O Aguado
- Andrena Iniciativas y Estudios Medio Ambientales, Valladolid, Spain
| |
Collapse
|
38
|
|
39
|
Abstract
Genomics and molecular imaging, along with clinical and translational research have transformed biomedical science into a data-intensive scientific endeavor. For researchers to benefit from Big Data sets, developing long-term biomedical digital data preservation strategy is very important. In this opinion article, we discuss specific actions that researchers and institutions can take to make research data a continued resource even after research projects have reached the end of their lifecycle. The actions involve utilizing an Open Archival Information System model comprised of six functional entities: Ingest, Access, Data Management, Archival Storage, Administration and Preservation Planning. We believe that involvement of data stewards early in the digital data life-cycle management process can significantly contribute towards long term preservation of biomedical data. Developing data collection strategies consistent with institutional policies, and encouraging the use of common data elements in clinical research, patient registries and other human subject research can be advantageous for data sharing and integration purposes. Specifically, data stewards at the onset of research program should engage with established repositories and curators to develop data sustainability plans for research data. Placing equal importance on the requirements for initial activities (e.g., collection, processing, storage) with subsequent activities (data analysis, sharing) can improve data quality, provide traceability and support reproducibility. Preparing and tracking data provenance, using common data elements and biomedical ontologies are important for standardizing the data description, making the interpretation and reuse of data easier. The Big Data biomedical community requires scalable platform that can support the diversity and complexity of data ingest modes (e.g. machine, software or human entry modes). Secure virtual workspaces to integrate and manipulate data, with shared software programs (e.g., bioinformatics tools), can facilitate the FAIR (Findable, Accessible, Interoperable and Reusable) use of data for near- and long-term research needs.
Collapse
Affiliation(s)
- Vivek Navale
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland, 20892, USA
| | - Matthew McAuliffe
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland, 20892, USA
| |
Collapse
|
40
|
Olsen LR, Leipold MD, Pedersen CB, Maecker HT. The anatomy of single cell mass cytometry data. Cytometry A 2018; 95:156-172. [DOI: 10.1002/cyto.a.23621] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 08/28/2018] [Accepted: 09/05/2018] [Indexed: 12/14/2022]
Affiliation(s)
- Lars R. Olsen
- Department of Bio and Health InformaticsTechnical University of Denmark Copenhagen Denmark
- Center for Genomic MedicineCopenhagen University Hospital Copenhagen Denmark
| | - Michael D. Leipold
- Institute for Immunity, Transplantation, and InfectionStanford University School of Medicine Stanford CA
| | - Christina B. Pedersen
- Department of Bio and Health InformaticsTechnical University of Denmark Copenhagen Denmark
- Center for Genomic MedicineCopenhagen University Hospital Copenhagen Denmark
| | - Holden Terry Maecker
- Institute for Immunity, Transplantation, and InfectionStanford University School of Medicine Stanford CA
| |
Collapse
|
41
|
Abstract
Most studies in the life sciences and other disciplines involve generating and analyzing numerical data of some type as the foundation for scientific findings. Working with numerical data involves multiple challenges. These include reproducible data acquisition, appropriate data storage, computationally correct data analysis, appropriate reporting and presentation of the results, and suitable data interpretation. Finding and correcting mistakes when analyzing and interpreting data can be frustrating and time-consuming. Presenting or publishing incorrect results is embarrassing but not uncommon. Particular sources of errors are inappropriate use of statistical methods and incorrect interpretation of data by software. To detect mistakes as early as possible, one should frequently check intermediate and final results for plausibility. Clearly documenting how quantities and results were obtained facilitates correcting mistakes. Properly understanding data is indispensable for reaching well-founded conclusions from experimental results. Units are needed to make sense of numbers, and uncertainty should be estimated to know how meaningful results are. Descriptive statistics and significance testing are useful tools for interpreting numerical results if applied correctly. However, blindly trusting in computed numbers can also be misleading, so it is worth thinking about how data should be summarized quantitatively to properly answer the question at hand. Finally, a suitable form of presentation is needed so that the data can properly support the interpretation and findings. By additionally sharing the relevant data, others can access, understand, and ultimately make use of the results. These quick tips are intended to provide guidelines for correctly interpreting, efficiently analyzing, and presenting numerical data in a useful way.
Collapse
Affiliation(s)
| | - Sabrina Rueschenbaum
- Department of Internal Medicine 1, University Hospital Frankfurt, Goethe University, Theodor-Stern-Kai 7, Frankfurt (Main), Germany
| |
Collapse
|
42
|
Griffin PC, Khadake J, LeMay KS, Lewis SE, Orchard S, Pask A, Pope B, Roessner U, Russell K, Seemann T, Treloar A, Tyagi S, Christiansen JH, Dayalan S, Gladman S, Hangartner SB, Hayden HL, Ho WWH, Keeble-Gagnère G, Korhonen PK, Neish P, Prestes PR, Richardson MF, Watson-Haigh NS, Wyres KL, Young ND, Schneider MV. Best practice data life cycle approaches for the life sciences. F1000Res 2018; 6:1618. [PMID: 30109017 DOI: 10.12688/f1000research.12344.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/17/2017] [Indexed: 11/20/2022] Open
Abstract
Throughout history, the life sciences have been revolutionised by technological advances; in our era this is manifested by advances in instrumentation for data generation, and consequently researchers now routinely handle large amounts of heterogeneous data in digital formats. The simultaneous transitions towards biology as a data science and towards a 'life cycle' view of research data pose new challenges. Researchers face a bewildering landscape of data management requirements, recommendations and regulations, without necessarily being able to access data management training or possessing a clear understanding of practical approaches that can assist in data management in their particular research domain. Here we provide an overview of best practice data life cycle approaches for researchers in the life sciences/bioinformatics space with a particular focus on 'omics' datasets and computer-based data processing and analysis. We discuss the different stages of the data life cycle and provide practical suggestions for useful tools and resources to improve data management practices.
Collapse
Affiliation(s)
- Philippa C Griffin
- EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia.,Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Jyoti Khadake
- NIHR BioResource, University of Cambridge and Cambridge University Hospitals NHS Foundation Trust Hills Road, Cambridge , CB2 0QQ, UK
| | - Kate S LeMay
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Suzanna E Lewis
- Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology Division, Berkeley, CA, 94720, USA
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge, CB10 1SD, UK
| | - Andrew Pask
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Bernard Pope
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Ute Roessner
- Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Keith Russell
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Torsten Seemann
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Andrew Treloar
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Sonika Tyagi
- Australian Genome Research Facility Ltd, Parkville, VIC, 3052, Australia.,Monash Bioinformatics Platform, Monash University, Clayton, VIC, 3800, Australia
| | - Jeffrey H Christiansen
- Queensland Cyber Infrastructure Foundation and the University of Queensland Research Computing Centre, St Lucia, QLD, 4072, Australia
| | - Saravanan Dayalan
- Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Simon Gladman
- EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Sandra B Hangartner
- School of Biological Sciences, Monash University, Clayton, VIC, 3800, Australia
| | - Helen L Hayden
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources (DEDJTR), Bundoora, VIC, 3083, Australia
| | - William W H Ho
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Gabriel Keeble-Gagnère
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources (DEDJTR), Bundoora, VIC, 3083, Australia
| | - Pasi K Korhonen
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Peter Neish
- The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Priscilla R Prestes
- Faculty of Science and Engineering, Federation University Australia, Mt Helen , VIC, 3350, Australia
| | - Mark F Richardson
- Bioinformatics Core Research Group & Centre for Integrative Ecology, Deakin University, Geelong, VIC, 3220, Australia
| | - Nathan S Watson-Haigh
- School of Agriculture, Food and Wine, University of Adelaide, Glen Osmond, SA, 5064, Australia
| | - Kelly L Wyres
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Neil D Young
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Maria Victoria Schneider
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia.,The University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|
43
|
Schloss PD. Identifying and Overcoming Threats to Reproducibility, Replicability, Robustness, and Generalizability in Microbiome Research. mBio 2018; 9:e00525-18. [PMID: 29871915 PMCID: PMC5989067 DOI: 10.1128/mbio.00525-18] [Citation(s) in RCA: 127] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The "reproducibility crisis" in science affects microbiology as much as any other area of inquiry, and microbiologists have long struggled to make their research reproducible. We need to respect that ensuring that our methods and results are sufficiently transparent is difficult. This difficulty is compounded in interdisciplinary fields such as microbiome research. There are many reasons why a researcher is unable to reproduce a previous result, and even if a result is reproducible, it may not be correct. Furthermore, failures to reproduce previous results have much to teach us about the scientific process and microbial life itself. This Perspective delineates a framework for identifying and overcoming threats to reproducibility, replicability, robustness, and generalizability of microbiome research. Instead of seeing signs of a crisis in others' work, we need to appreciate the technical and social difficulties that limit reproducibility in the work of others as well as our own.
Collapse
Affiliation(s)
- Patrick D Schloss
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
44
|
Durden JM, Luo JY, Alexander H, Flanagan AM, Grossmann L. Integrating “Big Data” into Aquatic Ecology: Challenges and Opportunities. ACTA ACUST UNITED AC 2017. [DOI: 10.1002/lob.10213] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
45
|
Griffin PC, Khadake J, LeMay KS, Lewis SE, Orchard S, Pask A, Pope B, Roessner U, Russell K, Seemann T, Treloar A, Tyagi S, Christiansen JH, Dayalan S, Gladman S, Hangartner SB, Hayden HL, Ho WWH, Keeble-Gagnère G, Korhonen PK, Neish P, Prestes PR, Richardson MF, Watson-Haigh NS, Wyres KL, Young ND, Schneider MV. Best practice data life cycle approaches for the life sciences. F1000Res 2017; 6:1618. [PMID: 30109017 PMCID: PMC6069748 DOI: 10.12688/f1000research.12344.2] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/29/2018] [Indexed: 11/20/2022] Open
Abstract
Throughout history, the life sciences have been revolutionised by technological advances; in our era this is manifested by advances in instrumentation for data generation, and consequently researchers now routinely handle large amounts of heterogeneous data in digital formats. The simultaneous transitions towards biology as a data science and towards a 'life cycle' view of research data pose new challenges. Researchers face a bewildering landscape of data management requirements, recommendations and regulations, without necessarily being able to access data management training or possessing a clear understanding of practical approaches that can assist in data management in their particular research domain. Here we provide an overview of best practice data life cycle approaches for researchers in the life sciences/bioinformatics space with a particular focus on 'omics' datasets and computer-based data processing and analysis. We discuss the different stages of the data life cycle and provide practical suggestions for useful tools and resources to improve data management practices.
Collapse
Affiliation(s)
- Philippa C Griffin
- EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia.,Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Jyoti Khadake
- NIHR BioResource, University of Cambridge and Cambridge University Hospitals NHS Foundation Trust Hills Road, Cambridge , CB2 0QQ, UK
| | - Kate S LeMay
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Suzanna E Lewis
- Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology Division, Berkeley, CA, 94720, USA
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge, CB10 1SD, UK
| | - Andrew Pask
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Bernard Pope
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Ute Roessner
- Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Keith Russell
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Torsten Seemann
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Andrew Treloar
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Sonika Tyagi
- Australian Genome Research Facility Ltd, Parkville, VIC, 3052, Australia.,Monash Bioinformatics Platform, Monash University, Clayton, VIC, 3800, Australia
| | - Jeffrey H Christiansen
- Queensland Cyber Infrastructure Foundation and the University of Queensland Research Computing Centre, St Lucia, QLD, 4072, Australia
| | - Saravanan Dayalan
- Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Simon Gladman
- EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Sandra B Hangartner
- School of Biological Sciences, Monash University, Clayton, VIC, 3800, Australia
| | - Helen L Hayden
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources (DEDJTR), Bundoora, VIC, 3083, Australia
| | - William W H Ho
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Gabriel Keeble-Gagnère
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources (DEDJTR), Bundoora, VIC, 3083, Australia
| | - Pasi K Korhonen
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Peter Neish
- The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Priscilla R Prestes
- Faculty of Science and Engineering, Federation University Australia, Mt Helen , VIC, 3350, Australia
| | - Mark F Richardson
- Bioinformatics Core Research Group & Centre for Integrative Ecology, Deakin University, Geelong, VIC, 3220, Australia
| | - Nathan S Watson-Haigh
- School of Agriculture, Food and Wine, University of Adelaide, Glen Osmond, SA, 5064, Australia
| | - Kelly L Wyres
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Neil D Young
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Maria Victoria Schneider
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia.,The University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|
46
|
Abstract
Computers are now essential in all branches of science, but most researchers are never taught the equivalent of basic lab skills for research computing. As a result, data can get lost, analyses can take much longer than necessary, and researchers are limited in how effectively they can work with software and data. Computing workflows need to follow the same practices as lab projects and notebooks, with organized data, documented steps, and the project structured for reproducibility, but researchers new to computing often don't know where to start. This paper presents a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill. These practices, which encompass data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts, are drawn from a wide variety of published sources from our daily lives and from our work with volunteer organizations that have delivered workshops to over 11,000 people since 2010.
Collapse
Affiliation(s)
- Greg Wilson
- Software Carpentry Foundation, Austin, Texas, United States of America
- * E-mail:
| | - Jennifer Bryan
- RStudio and Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Karen Cranston
- Department of Biology, Duke University, Durham, North Carolina, United States of America
| | - Justin Kitzes
- Energy and Resources Group, University of California, Berkeley, Berkeley, California, United States of America
| | - Lex Nederbragt
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway
| | - Tracy K. Teal
- Data Carpentry, Davis, California, United States of America
| |
Collapse
|