1
|
Swetnam TL, Antin PB, Bartelme R, Bucksch A, Camhy D, Chism G, Choi I, Cooksey AM, Cosi M, Cowen C, Culshaw-Maurer M, Davey R, Davey S, Devisetty U, Edgin T, Edmonds A, Fedorov D, Frady J, Fonner J, Gillan JK, Hossain I, Joyce B, Lang K, Lee T, Littin S, McEwen I, Merchant N, Micklos D, Nelson A, Ramsey A, Roberts S, Sarando P, Skidmore E, Song J, Sprinkle MM, Srinivasan S, Stanzione D, Strootman JD, Stryeck S, Tuteja R, Vaughn M, Wali M, Wall M, Walls R, Wang L, Wickizer T, Williams J, Wregglesworth J, Lyons E. CyVerse: Cyberinfrastructure for open science. PLoS Comput Biol 2024; 20:e1011270. [PMID: 38324613 PMCID: PMC10878509 DOI: 10.1371/journal.pcbi.1011270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 02/20/2024] [Accepted: 11/27/2023] [Indexed: 02/09/2024] Open
Abstract
CyVerse, the largest publicly-funded open-source research cyberinfrastructure for life sciences, has played a crucial role in advancing data-driven research since the 2010s. As the technology landscape evolved with the emergence of cloud computing platforms, machine learning and artificial intelligence (AI) applications, CyVerse has enabled access by providing interfaces, Software as a Service (SaaS), and cloud-native Infrastructure as Code (IaC) to leverage new technologies. CyVerse services enable researchers to integrate institutional and private computational resources, custom software, perform analyses, and publish data in accordance with open science principles. Over the past 13 years, CyVerse has registered more than 124,000 verified accounts from 160 countries and was used for over 1,600 peer-reviewed publications. Since 2011, 45,000 students and researchers have been trained to use CyVerse. The platform has been replicated and deployed in three countries outside the US, with additional private deployments on commercial clouds for US government agencies and multinational corporations. In this manuscript, we present a strategic blueprint for creating and managing SaaS cyberinfrastructure and IaC as free and open-source software.
Collapse
Affiliation(s)
- Tyson L. Swetnam
- The University of Arizona, Tucson, Arizona, United States of America
| | - Parker B. Antin
- The University of Arizona, Tucson, Arizona, United States of America
| | - Ryan Bartelme
- The University of Arizona, Tucson, Arizona, United States of America
- Pivot Bio, Berkeley, California, United States of America
| | - Alexander Bucksch
- The University of Arizona, Tucson, Arizona, United States of America
| | - David Camhy
- Graz University of Technology, Graz, Austria
| | - Greg Chism
- The University of Arizona, Tucson, Arizona, United States of America
| | - Illyoung Choi
- The University of Arizona, Tucson, Arizona, United States of America
| | - Amanda M. Cooksey
- The University of Arizona, Tucson, Arizona, United States of America
| | - Michele Cosi
- The University of Arizona, Tucson, Arizona, United States of America
| | - Cindy Cowen
- The University of Arizona, Tucson, Arizona, United States of America
| | - Michael Culshaw-Maurer
- The University of Arizona, Tucson, Arizona, United States of America
- The Carpentries, Oakland, California, United States of America
| | - Robert Davey
- The Carpentries, Oakland, California, United States of America
- Earlham Institute, Norwich, United Kingdom
| | - Sean Davey
- The University of Arizona, Tucson, Arizona, United States of America
| | - Upendra Devisetty
- The University of Arizona, Tucson, Arizona, United States of America
- Greenlight Biosciences, Durham North Carolina, United States of America
| | - Tony Edgin
- The University of Arizona, Tucson, Arizona, United States of America
| | - Andy Edmonds
- The University of Arizona, Tucson, Arizona, United States of America
| | - Dmitry Fedorov
- ViQI Inc. Santa Barbara, California, United States of America
| | - Jeremy Frady
- The University of Arizona, Tucson, Arizona, United States of America
| | - John Fonner
- Texas Advanced Computing Center, Austin Texas, United States of America
| | - Jeffrey K. Gillan
- The University of Arizona, Tucson, Arizona, United States of America
| | - Iqbal Hossain
- The University of Arizona, Tucson, Arizona, United States of America
| | - Blake Joyce
- The University of Arizona, Tucson, Arizona, United States of America
| | | | - Tina Lee
- The University of Arizona, Tucson, Arizona, United States of America
| | - Shelley Littin
- The University of Arizona, Tucson, Arizona, United States of America
| | - Ian McEwen
- The University of Arizona, Tucson, Arizona, United States of America
| | - Nirav Merchant
- The University of Arizona, Tucson, Arizona, United States of America
| | - David Micklos
- DNA Learning Center, Cold Spring Harbor Laboratory, Long Island New York, United States of America
| | - Andrew Nelson
- Boyce Thompson Institute, Ithaca, New York, United States of America
| | - Ashley Ramsey
- The University of Arizona, Tucson, Arizona, United States of America
| | - Sarah Roberts
- The University of Arizona, Tucson, Arizona, United States of America
| | - Paul Sarando
- The University of Arizona, Tucson, Arizona, United States of America
| | - Edwin Skidmore
- The University of Arizona, Tucson, Arizona, United States of America
| | - Jawon Song
- Texas Advanced Computing Center, Austin Texas, United States of America
| | | | - Sriram Srinivasan
- The University of Arizona, Tucson, Arizona, United States of America
| | - Dan Stanzione
- Texas Advanced Computing Center, Austin Texas, United States of America
| | | | - Sarah Stryeck
- Graz University of Technology, Graz, Austria
- Know Center GmbH, Graz, Austria
| | - Reetu Tuteja
- The University of Arizona, Tucson, Arizona, United States of America
- Greenlight Biosciences, Durham North Carolina, United States of America
| | - Matthew Vaughn
- Texas Advanced Computing Center, Austin Texas, United States of America
| | - Mojib Wali
- Graz University of Technology, Graz, Austria
| | - Mariah Wall
- The University of Arizona, Tucson, Arizona, United States of America
| | - Ramona Walls
- The University of Arizona, Tucson, Arizona, United States of America
- Critical Path Institute, Tucson, Arizona, United States of America
| | - Liya Wang
- DNA Learning Center, Cold Spring Harbor Laboratory, Long Island New York, United States of America
| | - Todd Wickizer
- The University of Arizona, Tucson, Arizona, United States of America
| | - Jason Williams
- DNA Learning Center, Cold Spring Harbor Laboratory, Long Island New York, United States of America
| | | | - Eric Lyons
- The University of Arizona, Tucson, Arizona, United States of America
| |
Collapse
|
3
|
Inclusive collaboration across plant physiology and genomics: Now is the time! PLANT DIRECT 2023; 7:e493. [PMID: 37214275 PMCID: PMC10192722 DOI: 10.1002/pld3.493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 03/16/2023] [Accepted: 03/27/2023] [Indexed: 05/24/2023]
Abstract
Within the broad field of plant sciences, what are the most pressing challenges and opportunities to advance? Answers to this question usually include food and nutritional security, climate change mitigation, adaptation of plants to changing climates, preservation of biodiversity and ecosystem services, production of plant-based proteins and products, and growth of the bioeconomy. Genes and the processes their products carry out create differences in how plants grow, develop, and behave, and thus, the key solutions to these challenges lie squarely in the space where plant genomics and physiology intersect. Advancements in genomics, phenomics, and analysis tools have generated massive datasets, but these data are complex and have not always generated scientific insights at the anticipated pace. Further, new tools may need to be created or adapted, and field-relevant applications tested, to advance scientific discovery derived from such datasets. Meaningful, relevant conclusions and connections from genomics and plant physiological and biochemical data require both subject matter expertise and the collaborative skills needed to work together outside of specific disciplines. Bringing the best expertise to bear on complex problems in plant sciences requires enhanced, inclusive, and sustained collaboration across disciplines. However, despite significant efforts to enable and sustain collaborative research, a variety of challenges persist. Here, we present the outcomes and conclusions of two workshops convened to address the need for collaboration between scientists engaged in plant physiology, genetics, and genomics and to discuss the approaches that will create the necessary environments to support successful collaboration. We conclude with approaches to share and reward collaboration and the need to train inclusive scientists that will have the skills to thrive in interdisciplinary contexts.
Collapse
|
4
|
Hou Q, Waury K, Gogishvili D, Feenstra KA. Ten quick tips for sequence-based prediction of protein properties using machine learning. PLoS Comput Biol 2022; 18:e1010669. [PMID: 36454728 PMCID: PMC9714715 DOI: 10.1371/journal.pcbi.1010669] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
The ubiquitous availability of genome sequencing data explains the popularity of machine learning-based methods for the prediction of protein properties from their amino acid sequences. Over the years, while revising our own work, reading submitted manuscripts as well as published papers, we have noticed several recurring issues, which make some reported findings hard to understand and replicate. We suspect this may be due to biologists being unfamiliar with machine learning methodology, or conversely, machine learning experts may miss some of the knowledge needed to correctly apply their methods to proteins. Here, we aim to bridge this gap for developers of such methods. The most striking issues are linked to a lack of clarity: how were annotations of interest obtained; which benchmark metrics were used; how are positives and negatives defined. Others relate to a lack of rigor: If you sneak in structural information, your method is not sequence-based; if you compare your own model to "state-of-the-art," take the best methods; if you want to conclude that some method is better than another, obtain a significance estimate to support this claim. These, and other issues, we will cover in detail. These points may have seemed obvious to the authors during writing; however, they are not always clear-cut to the readers. We also expect many of these tips to hold for other machine learning-based applications in biology. Therefore, many computational biologists who develop methods in this particular subject will benefit from a concise overview of what to avoid and what to do instead.
Collapse
Affiliation(s)
- Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong, P. R. China
- National Institute of Health Data Science of China, Shandong University, Shandong, P. R. China
| | - Katharina Waury
- Department of Computer Science, Bioinformatics Group, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Dea Gogishvili
- Department of Computer Science, Bioinformatics Group, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - K. Anton Feenstra
- Department of Computer Science, Bioinformatics Group, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| |
Collapse
|
5
|
Gaynor KM, Azevedo T, Boyajian C, Brun J, Budden AE, Cole A, Csik S, DeCesaro J, Do-Linh H, Dudney J, Galaz García C, Leonard S, Lyon NJ, Marks A, Parish J, Phillips AA, Scarborough C, Smith J, Thompson M, Vargas Poulsen C, Fong CR. Ten simple rules to cultivate belonging in collaborative data science research teams. PLoS Comput Biol 2022; 18:e1010567. [PMID: 36327241 PMCID: PMC9632775 DOI: 10.1371/journal.pcbi.1010567] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Affiliation(s)
- Kaitlyn M. Gaynor
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
- Departments of Zoology and Botany, University of British Columbia, Vancouver, British Columbia, Canada
| | - Therese Azevedo
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Clarissa Boyajian
- Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Julien Brun
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
- Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Amber E. Budden
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
- Main Library, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Allie Cole
- Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Samantha Csik
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
- Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Joe DeCesaro
- Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Halina Do-Linh
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
- Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Joan Dudney
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Carmen Galaz García
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Scout Leonard
- Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Nicholas J. Lyon
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Althea Marks
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Julia Parish
- Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Alexandra A. Phillips
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Courtney Scarborough
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Joshua Smith
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Marcus Thompson
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Camila Vargas Poulsen
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Caitlin R. Fong
- National Center for Ecological Analysis and Synthesis, University of California Santa Barbara, Santa Barbara, California, United States of America
| |
Collapse
|
6
|
Abstract
AbstractCitations remain a prime, yet controversial, measure of academic performance. Ideally, how often a paper is cited should solely depend on the quality of the science reported therein. However, non-scientific factors, including structural elements (e.g., length of abstract, number of references) or attributes of authors (e.g., prestige and gender), may all influence citation outcomes. Knowing the predicted effect of these features on citations might make it possible to ‘game the system’ of citation counts when writing a paper. We conducted a meta-analysis to build a quantitative understanding of the effect of similar non-scientific features on the impact of scientific articles in terms of citations. We showed that article length, number of authors, author experience and their collaboration network, Impact Factors, availability as open access, online sharing, different referencing practice, and number of figures all exerted a positive influence on citations. These patterns were consistent across most disciplines. We also documented temporal trends towards a recent increase in the effect of journal Impact Factor and number of authors on citations. We suggest that our approach can be used as a benchmark to monitor the influence of these effects over time, minimising the influence of non-scientific features as a means to game the system of citation counts, and thus enhancing their usefulness as a measure of scientific quality.
Collapse
|