Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Karimzadeh M, Hoffman MM. Top considerations for creating bioinformatics software documentation. Brief Bioinform 2019;19:693-699. [PMID: 28088754 PMCID: PMC6054259 DOI: 10.1093/bib/bbw134] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Indexed: 11/20/2022] Open

For:	Karimzadeh M, Hoffman MM. Top considerations for creating bioinformatics software documentation. Brief Bioinform 2019;19:693-699. [PMID: 28088754 PMCID: PMC6054259 DOI: 10.1093/bib/bbw134] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Indexed: 11/20/2022] Open

Number

Cited by Other Article(s)

Duo H, Li Y, Lan Y, Tao J, Yang Q, Xiao Y, Sun J, Li L, Nie X, Zhang X, Liang G, Liu M, Hao Y, Li B. Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios. Genome Biol 2024;25:145. [PMID: 38831386 PMCID: PMC11149245 DOI: 10.1186/s13059-024-03290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open

Abstract

BACKGROUND

Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines.

RESULTS

We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation.

CONCLUSIONS

No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.

Collapse

Affiliation(s)

Hongrui Duo College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Yinghong Li Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, People's Republic of China
Yang Lan Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Army Medical University, Chongqing, 400038, People's Republic of China
Jingxin Tao College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Qingxia Yang Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, People's Republic of China
Yingxue Xiao College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Jing Sun College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Lei Li College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Xiner Nie Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
Xiaoxi Zhang College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
Guizhao Liang Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
Mingwei Liu Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, People's Republic of China
Youjin Hao College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
Bo Li College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.

Collapse

Bai J, Kamatchinathan S, Kundu DJ, Bandla C, Vizcaíno JA, Perez-Riverol Y. Open-source large language models in action: A bioinformatics chatbot for PRIDE database. Proteomics 2024:e2400005. [PMID: 38556628 DOI: 10.1002/pmic.202400005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 03/08/2024] [Accepted: 03/20/2024] [Indexed: 04/02/2024]

Loreto ELS, Melo ESD, Wallau GL, Gomes TMFF. The good, the bad and the ugly of transposable elements annotation tools. Genet Mol Biol 2024;46:e20230138. [PMID: 38373163 PMCID: PMC10876081 DOI: 10.1590/1678-4685-gmb-2023-0138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 11/26/2023] [Indexed: 02/21/2024] Open

Chicco D, Spolaor S, Nobile MS. Ten quick tips for fuzzy logic modeling of biomedical systems. PLoS Comput Biol 2023;19:e1011700. [PMID: 38127800 PMCID: PMC10734980 DOI: 10.1371/journal.pcbi.1011700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023] Open

Lubiana T, Lopes R, Medeiros P, Silva JC, Goncalves ANA, Maracaja-Coutinho V, Nakaya HI. Ten quick tips for harnessing the power of ChatGPT in computational biology. PLoS Comput Biol 2023;19:e1011319. [PMID: 37561669 PMCID: PMC10414555 DOI: 10.1371/journal.pcbi.1011319] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023] Open

Chicco D, Cumbo F, Angione C. Ten quick tips for avoiding pitfalls in multi-omics data integration analyses. PLoS Comput Biol 2023;19:e1011224. [PMID: 37410704 DOI: 10.1371/journal.pcbi.1011224] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023] Open

Chicco D, Ferraro Petrillo U, Cattaneo G. Ten quick tips for bioinformatics analyses using an Apache Spark distributed computing environment. PLoS Comput Biol 2023;19:e1011272. [PMID: 37471333 PMCID: PMC10358940 DOI: 10.1371/journal.pcbi.1011272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/22/2023] Open

Greenberg ZF, Graim KS, He M. Towards artificial intelligence-enabled extracellular vesicle precision drug delivery. Adv Drug Deliv Rev 2023:114974. [PMID: 37356623 DOI: 10.1016/j.addr.2023.114974] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 06/27/2023]

Du X, Dastmalchi F, Ye H, Garrett TJ, Diller MA, Liu M, Hogan WR, Brochhausen M, Lemas DJ. Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software. Metabolomics 2023;19:11. [PMID: 36745241 DOI: 10.1007/s11306-023-01974-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 01/20/2023] [Indexed: 02/07/2023]

Abstract

BACKGROUND

Liquid chromatography-high resolution mass spectrometry (LC-HRMS) is a popular approach for metabolomics data acquisition and requires many data processing software tools. The FAIR Principles - Findability, Accessibility, Interoperability, and Reusability - were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. More recently, the FAIR principles were extended to include Research Software (FAIR4RS).

AIM OF REVIEW

This study facilitates open science in metabolomics by providing an implementation solution for adopting FAIR4RS in the LC-HRMS metabolomics data processing software. We believe our evaluation guidelines and results can help improve the FAIRness of research software.

KEY SCIENTIFIC CONCEPTS OF REVIEW

We evaluated 124 LC-HRMS metabolomics data processing software obtained from a systematic review and selected 61 software for detailed evaluation using FAIR4RS-related criteria, which were extracted from the literature along with internal discussions. We assigned each criterion one or more FAIR4RS categories through discussion. The minimum, median, and maximum percentages of criteria fulfillment of software were 21.6%, 47.7%, and 71.8%. Statistical analysis revealed no significant improvement in FAIRness over time. We identified four criteria covering multiple FAIR4RS categories but had a low %fulfillment: (1) No software had semantic annotation of key information; (2) only 6.3% of evaluated software were registered to Zenodo and received DOIs; (3) only 14.5% of selected software had official software containerization or virtual machine; (4) only 16.7% of evaluated software had a fully documented functions in code. According to the results, we discussed improvement strategies and future directions.

Collapse

Chicco D, Oneto L, Tavazzi E. Eleven quick tips for data cleaning and feature engineering. PLoS Comput Biol 2022;18:e1010718. [PMID: 36520712 PMCID: PMC9754225 DOI: 10.1371/journal.pcbi.1010718] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Nine quick tips for pathway enrichment analysis. PLoS Comput Biol 2022;18:e1010348. [PMID: 35951505 PMCID: PMC9371296 DOI: 10.1371/journal.pcbi.1010348] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Usability evaluation of circRNA identification tools: Development of a heuristic-based framework and analysis. Comput Biol Med 2022;147:105785. [PMID: 35780604 DOI: 10.1016/j.compbiomed.2022.105785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 05/23/2022] [Accepted: 06/26/2022] [Indexed: 11/21/2022]

Niu YN, Roberts EG, Denisko D, Hoffman MM. Assessing and assuring interoperability of a genomics file format. Bioinformatics 2022;38:3327-3336. [PMID: 35575355 PMCID: PMC9237710 DOI: 10.1093/bioinformatics/btac327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/30/2022] [Accepted: 05/11/2022] [Indexed: 12/01/2022] Open

Hermann S, Fehr J. Documenting research software in engineering science. Sci Rep 2022;12:6567. [PMID: 35449149 PMCID: PMC9023583 DOI: 10.1038/s41598-022-10376-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 04/05/2022] [Indexed: 11/09/2022] Open

van der Putten BCL, Mendes CI, Talbot BM, de Korne-Elenbaas J, Mamede R, Vila-Cerqueira P, Coelho LP, Gulvik CA, Katz LS, The Asm Ngs Hackathon Participants. Software testing in microbial bioinformatics: a call to action. Microb Genom 2022;8. [PMID: 35259087 PMCID: PMC9176277 DOI: 10.1099/mgen.0.000790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Noor A. Improving bioinformatics software quality through incorporation of software engineering practices. PeerJ Comput Sci 2022;8:e839. [PMID: 35111923 PMCID: PMC8771759 DOI: 10.7717/peerj-cs.839] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 12/13/2021] [Indexed: 06/14/2023]

Heil BJ, Hoffman MM, Markowetz F, Lee SI, Greene CS, Hicks SC. Reproducibility standards for machine learning in the life sciences. Nat Methods 2021;18:1132-1135. [PMID: 34462593 PMCID: PMC9131851 DOI: 10.1038/s41592-021-01256-7] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Wratten L, Wilm A, Göke J. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat Methods 2021;18:1161-1168. [PMID: 34556866 DOI: 10.1038/s41592-021-01254-9] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 07/29/2021] [Indexed: 02/08/2023]

Chang HY, Colby SM, Du X, Gomez JD, Helf MJ, Kechris K, Kirkpatrick CR, Li S, Patti GJ, Renslow RS, Subramaniam S, Verma M, Xia J, Young JD. A Practical Guide to Metabolomics Software Development. Anal Chem 2021;93:1912-1923. [PMID: 33467846 PMCID: PMC7859930 DOI: 10.1021/acs.analchem.0c03581] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Affiliation(s)

Hui-Yin Chang Department of Pathology, University of Michigan, 1301 Catherine Street, Ann Arbor, Michigan 48109, United States.,Department of Biomedical Sciences and Engineering, National Central University, No. 300, Zhongda Road, Zhongli District, Taoyuan City 320, Taiwan
Sean M Colby Biological Sciences Division, Pacific Northwest National Laboratory, P.O. Box 999, MSIN: K8-98, Richland, Washington 99352, United States
Xiuxia Du Department of Bioinformatics & Genomics, University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, North Carolina 28223, United States
Javier D Gomez Department of Chemical and Biomolecular Engineering, Vanderbilt University, PMB 351604, 2301 Vanderbilt Place, Nashville, Tennessee 37235, United States
Maximilian J Helf Boyce Thompson Institute and Department of Chemistry and Chemical Biology, Cornell University, 533 Tower Road, Ithaca, New York 14853, United States
Katerina Kechris Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, 13001 East 17th Place B119, Aurora, Colorado 80045, United States
Christine R Kirkpatrick San Diego Supercomputer Center, University of California San Diego, MC 0505, 9500 Gilman Drive, La Jolla, California 92093, United States
Shuzhao Li The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, Connecticut 06032, United States
Gary J Patti Department of Chemistry, Department of Medicine, and Siteman Cancer Center, Washington University in St. Louis, CB 1134, One Brookings Drive, St. Louis, Missouri 63130, United States
Ryan S Renslow Biological Sciences Division, Pacific Northwest National Laboratory, P.O. Box 999, MSIN: K8-98, Richland, Washington 99352, United States.,Gene and Linda Voiland School of Chemical Engineering and Bioengineering, Washington State University, P.O. Box 646515, Pullman, Washington 99164, United States
Shankar Subramaniam San Diego Supercomputer Center, University of California San Diego, MC 0505, 9500 Gilman Drive, La Jolla, California 92093, United States.,Department of Bioengineering, Department of Computer Science and Engineering, Department of Cellular and Molecular Medicine, and Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive #0412, La Jolla, California 92093, United States
Mukesh Verma Epidemiology and Genomics Research Program, National Cancer Institute, National Institutes of Health, Suite 4E102, 9609 Medical Center Drive, MSC 9763, Rockville, Maryland 20850, United States
Jianguo Xia Faculty of Agricultural and Environmental Sciences, McGill University, 21111 Lakeshore Road, Ste. Anne de Bellevue, Quebec H9X 3 V9, Canada
Jamey D Young Department of Chemical and Biomolecular Engineering, Vanderbilt University, PMB 351604, 2301 Vanderbilt Place, Nashville, Tennessee 37235, United States.,Department of Molecular Physiology and Biophysics, Vanderbilt University, PMB 351604, 2301 Vanderbilt Place, Nashville, Tennessee 37235, United States

Collapse

Maguire F, Jia B, Gray KL, Lau WYV, Beiko RG, Brinkman FSL. Metagenome-assembled genome binning methods with short reads disproportionately fail for plasmids and genomic Islands. Microb Genom 2020;6:mgen000436. [PMID: 33001022 PMCID: PMC7660262 DOI: 10.1099/mgen.0.000436] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Accepted: 09/04/2020] [Indexed: 12/12/2022] Open

Carper DL, Lawrence TJ, Carrell AA, Pelletier DA, Weston DJ. DISCo-microbe: design of an identifiable synthetic community of microbes. PeerJ 2020;8:e8534. [PMID: 32149021 PMCID: PMC7049465 DOI: 10.7717/peerj.8534] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 01/08/2020] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Microbiomes are extremely important for their host organisms, providing many vital functions and extending their hosts' phenotypes. Natural studies of host-associated microbiomes can be difficult to interpret due to the high complexity of microbial communities, which hinders our ability to track and identify individual members along with the many factors that structure or perturb those communities. For this reason, researchers have turned to synthetic or constructed communities in which the identities of all members are known. However, due to the lack of tracking methods and the difficulty of creating a more diverse and identifiable community that can be distinguished through next-generation sequencing, most such in vivo studies have used only a few strains.

RESULTS

To address this issue, we developed DISCo-microbe, a program for the design of an identifiable synthetic community of microbes for use in in vivo experimentation. The program is composed of two modules; (1) create, which allows the user to generate a highly diverse community list from an input DNA sequence alignment using a custom nucleotide distance algorithm, and (2) subsample, which subsamples the community list to either represent a number of grouping variables, including taxonomic proportions, or to reach a user-specified maximum number of community members. As an example, we demonstrate the generation of a synthetic microbial community that can be distinguished through amplicon sequencing. The synthetic microbial community in this example consisted of 2,122 members from a starting DNA sequence alignment of 10,000 16S rRNA sequences from the Ribosomal Database Project. We generated simulated Illumina sequencing data from the constructed community and demonstrate that DISCo-microbe is capable of designing diverse communities with members distinguishable by amplicon sequencing. Using the simulated data we were able to recover sequences from between 97-100% of community members using two different post-processing workflows. Furthermore, 97-99% of sequences were assigned to a community member with zero sequences being misidentified. We then subsampled the community list using taxonomic proportions to mimic a natural plant host-associated microbiome, ultimately yielding a diverse community of 784 members.

CONCLUSIONS

DISCo-microbe can create a highly diverse community list of microbes that can be distinguished through 16S rRNA gene sequencing, and has the ability to subsample (i.e., design) the community for the desired number of members and taxonomic proportions. Although developed for bacteria, the program allows for any alignment input from any taxonomic group, making it broadly applicable. The software and data are freely available from GitHub (https://github.com/dlcarper/DISCo-microbe) and Python Package Index (PYPI).

Collapse

Georgeson P, Syme A, Sloggett C, Chung J, Dashnow H, Milton M, Lonsdale A, Powell D, Seemann T, Pope B. Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software. Gigascience 2019;8:giz109. [PMID: 31544213 PMCID: PMC6755254 DOI: 10.1093/gigascience/giz109] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/16/2019] [Accepted: 08/13/2019] [Indexed: 11/14/2022] Open

Abstract

BACKGROUND

Bioinformatics software tools are often created ad hoc, frequently by people without extensive training in software development. In particular, for beginners, the barrier to entry in bioinformatics software development is high, especially if they want to adopt good programming practices. Even experienced developers do not always follow best practices. This results in the proliferation of poorer-quality bioinformatics software, leading to limited scalability and inefficient use of resources; lack of reproducibility, usability, adaptability, and interoperability; and erroneous or inaccurate results.

FINDINGS

We have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in 1 of 12 programming languages. The resulting software is functional, carrying out a prototypical bioinformatics task, and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, progress logging, defined exit status values, a test suite, a version number, standardized building and packaging, user documentation, code documentation, a standard open source software license, software revision control, and containerization.

CONCLUSIONS

Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. This also benefits users because tools are more easily installed and consistent in their usage. Bionitio is released as open source software under the MIT License and is available at https://github.com/bionitio-team/bionitio.

Collapse

Affiliation(s)

Peter Georgeson Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053 Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, 305 Grattan Street, Melbourne, Victoria, Australia 3000
Anna Syme Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053 Royal Botanic Gardens Victoria, Birdwood Avenue, Melbourne, Victoria, Australia 3004
Clare Sloggett Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
Jessica Chung Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
Harriet Dashnow Bioinformatics, Murdoch Children's Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Victoria, Australia 3052 School of BioSciences, The University of Melbourne, Royal Parade, Parkville, Victoria, Australia 3052
Michael Milton Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053 Melbourne Genomics Health Alliance, Walter and Eliza Hall Institute, 1G Royal Parade, Parkville, Victoria, Australia 3052
Andrew Lonsdale Bioinformatics, Murdoch Children's Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Victoria, Australia 3052 ARC Centre of Excellence in Plant Cell Walls, School of BioSciences, The University of Melbourne, Royal Parade, Parkville, Victoria, Australia 3052
David Powell Monash Bioinformatics Platform, Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, 15 Innovation Walk, Monash University, Clayton, Victoria, Australia 3800
Torsten Seemann Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053 Department of Microbiology and Immunology, Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street Melbourne, Victoria, Australia 3000
Bernard Pope Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053 Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, 305 Grattan Street, Melbourne, Victoria, Australia 3000 Department of Medicine, Central Clinical School, Monash University, Clayton, Victoria, Australia 3800

Collapse

Sumonja N, Gemovic B, Veljkovic N, Perovic V. Automated feature engineering improves prediction of protein-protein interactions. Amino Acids 2019;51:1187-1200. [PMID: 31278492 DOI: 10.1007/s00726-019-02756-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Accepted: 06/26/2019] [Indexed: 10/26/2022]

Mangul S, Mosqueiro T, Abdill RJ, Duong D, Mitchell K, Sarwal V, Hill B, Brito J, Littman RJ, Statz B, Lam AKM, Dayama G, Grieneisen L, Martin LS, Flint J, Eskin E, Blekhman R. Challenges and recommendations to improve the installability and archival stability of omics computational tools. PLoS Biol 2019;17:e3000333. [PMID: 31220077 PMCID: PMC6605654 DOI: 10.1371/journal.pbio.3000333] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 07/02/2019] [Indexed: 01/07/2023] Open

Affiliation(s)

Serghei Mangul Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, California, United States of America
Thiago Mosqueiro Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, California, United States of America
Richard J. Abdill Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
Dat Duong Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
Keith Mitchell Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
Varuni Sarwal Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
Brian Hill Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
Jaqueline Brito Institute of Mathematics and Computer Science, University of São Paulo, São Paulo, Brazil
Russell Jared Littman Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
Benjamin Statz Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
Angela Ka-Mei Lam Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
Gargi Dayama Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
Laura Grieneisen Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
Lana S. Martin Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, California, United States of America
Jonathan Flint Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, United States of America
Eleazar Eskin Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
Ran Blekhman Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America Department of Ecology, Evolution, and Behavior, University of Minnesota, Minnesota, United States of America

Collapse

Hart SN. Will Digital Pathology be as Disruptive as Genomics? J Pathol Inform 2018;9:27. [PMID: 30167342 PMCID: PMC6106127 DOI: 10.4103/jpi.jpi_25_18] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 06/20/2018] [Indexed: 11/23/2022] Open

Chicco D. Ten quick tips for machine learning in computational biology. BioData Min 2017;10:35. [PMID: 29234465 PMCID: PMC5721660 DOI: 10.1186/s13040-017-0155-3] [Citation(s) in RCA: 319] [Impact Index Per Article: 45.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 11/08/2017] [Indexed: 11/12/2022] Open

Taschuk M, Wilson G. Ten simple rules for making research software more robust. PLoS Comput Biol 2017;13:e1005412. [PMID: 28407023 PMCID: PMC5390961 DOI: 10.1371/journal.pcbi.1005412] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open