1
|
Washington P. A Perspective on Crowdsourcing and Human-in-the-Loop Workflows in Precision Health. J Med Internet Res 2024; 26:e51138. [PMID: 38602750 PMCID: PMC11046386 DOI: 10.2196/51138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 11/15/2023] [Accepted: 01/30/2024] [Indexed: 04/12/2024] Open
Abstract
Modern machine learning approaches have led to performant diagnostic models for a variety of health conditions. Several machine learning approaches, such as decision trees and deep neural networks, can, in principle, approximate any function. However, this power can be considered to be both a gift and a curse, as the propensity toward overfitting is magnified when the input data are heterogeneous and high dimensional and the output class is highly nonlinear. This issue can especially plague diagnostic systems that predict behavioral and psychiatric conditions that are diagnosed with subjective criteria. An emerging solution to this issue is crowdsourcing, where crowd workers are paid to annotate complex behavioral features in return for monetary compensation or a gamified experience. These labels can then be used to derive a diagnosis, either directly or by using the labels as inputs to a diagnostic machine learning model. This viewpoint describes existing work in this emerging field and discusses ongoing challenges and opportunities with crowd-powered diagnostic systems, a nascent field of study. With the correct considerations, the addition of crowdsourcing to human-in-the-loop machine learning workflows for the prediction of complex and nuanced health conditions can accelerate screening, diagnostics, and ultimately access to care.
Collapse
Affiliation(s)
- Peter Washington
- Information and Computer Sciences, University of Hawaii at Manoa, Honolulu, HI, United States
| |
Collapse
|
2
|
Cincilla G, Masoni S, Blobel J. Individual and collective human intelligence in drug design: evaluating the search strategy. J Cheminform 2021; 13:80. [PMID: 34635158 PMCID: PMC8507178 DOI: 10.1186/s13321-021-00556-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 09/18/2021] [Indexed: 11/10/2022] Open
Abstract
In recent years, individual and collective human intelligence, defined as the knowledge, skills, reasoning and intuition of individuals and groups, have been used in combination with computer algorithms to solve complex scientific problems. Such approach was successfully used in different research fields such as: structural biology, comparative genomics, macromolecular crystallography and RNA design. Herein we describe an attempt to use a similar approach in small-molecule drug discovery, specifically to drive search strategies of de novo drug design. This is assessed with a case study that consists of a series of public experiments in which participants had to explore the huge chemical space in silico to find predefined compounds by designing molecules and analyzing the score associate with them. Such a process may be seen as an instantaneous surrogate of the classical design-make-test cycles carried out by medicinal chemists during the drug discovery hit to lead phase but not hindered by long synthesis and testing times. We present first findings on (1) assessing human intelligence in chemical space exploration, (2) comparing individual and collective human intelligence performance in this task and (3) contrasting some human and artificial intelligence achievements in de novo drug design.
Collapse
Affiliation(s)
- Giovanni Cincilla
- Molomics, Barcelona Science Park, c/Baldiri i Reixac 4-12, 08028, Barcelona, Spain.
| | - Simone Masoni
- Molomics, Barcelona Science Park, c/Baldiri i Reixac 4-12, 08028, Barcelona, Spain.
| | - Jascha Blobel
- Molomics, Barcelona Science Park, c/Baldiri i Reixac 4-12, 08028, Barcelona, Spain.
| |
Collapse
|
3
|
Das R, Keep B, Washington P, Riedel-Kruse IH. Scientific Discovery Games for Biomedical Research. Annu Rev Biomed Data Sci 2019; 2:253-279. [PMID: 34308269 DOI: 10.1146/annurev-biodatasci-072018-021139] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Over the past decade, scientific discovery games (SDGs) have emerged as a viable approach for biomedical research, engaging hundreds of thousands of volunteer players and resulting in numerous scientific publications. After describing the origins of this novel research approach, we review the scientific output of SDGs across molecular modeling, sequence alignment, neuroscience, pathology, cellular biology, genomics, and human cognition. We find compelling results and technical innovations arising in problem-oriented games such as Foldit and Eterna and in data-oriented games such as EyeWire and Project Discovery. We discuss emergent properties of player communities shared across different projects, including the diversity of communities and the extraordinary contributions of some volunteers, such as paper writing. Finally, we highlight connections to artificial intelligence, biological cloud laboratories, new game genres, science education, and open science that may drive the next generation of SDGs.
Collapse
Affiliation(s)
- Rhiju Das
- Department of Biochemistry and Department of Physics, Stanford University, Stanford, California 94305, USA
| | - Benjamin Keep
- Department of Learning Sciences, Stanford University, Stanford, California 94305, USA
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | | |
Collapse
|
4
|
Jenkinson J. Molecular Biology Meets the Learning Sciences: Visualizations in Education and Outreach. J Mol Biol 2018; 430:4013-4027. [DOI: 10.1016/j.jmb.2018.08.020] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 08/10/2018] [Accepted: 08/22/2018] [Indexed: 10/28/2022]
|
5
|
Boutron I, Ravaud P. Misrepresentation and distortion of research in biomedical literature. Proc Natl Acad Sci U S A 2018; 115:2613-2619. [PMID: 29531025 PMCID: PMC5856510 DOI: 10.1073/pnas.1710755115] [Citation(s) in RCA: 147] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Publication in peer-reviewed journals is an essential step in the scientific process. However, publication is not simply the reporting of facts arising from a straightforward analysis thereof. Authors have broad latitude when writing their reports and may be tempted to consciously or unconsciously "spin" their study findings. Spin has been defined as a specific intentional or unintentional reporting that fails to faithfully reflect the nature and range of findings and that could affect the impression the results produce in readers. This article, based on a literature review, reports the various practices of spin from misreporting by "beautification" of methods to misreporting by misinterpreting the results. It provides data on the prevalence of some forms of spin in specific fields and the possible effects of some types of spin on readers' interpretation and research dissemination. We also discuss why researchers would spin their reports and possible ways to avoid it.
Collapse
Affiliation(s)
- Isabelle Boutron
- Methods of Therapeutic Evaluation Of Chronic Diseases (METHODS) team, INSERM, UMR 1153, Epidemiology and Biostatistics Sorbonne Paris Cité Research Center (CRESS), F-75014 Paris, France;
- Faculté de Médicine, Paris Descartes University, 75006 Paris, France
- Centre d'Épidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, 75004 Paris, France
| | - Philippe Ravaud
- Methods of Therapeutic Evaluation Of Chronic Diseases (METHODS) team, INSERM, UMR 1153, Epidemiology and Biostatistics Sorbonne Paris Cité Research Center (CRESS), F-75014 Paris, France
- Faculté de Médicine, Paris Descartes University, 75006 Paris, France
- Centre d'Épidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, 75004 Paris, France
- Department of Epidemiology, Columbia University Mailman School of Public Health, New York, NY 10032
| |
Collapse
|
6
|
Woods CT, Laederach A. Classification of RNA structure change by 'gazing' at experimental data. Bioinformatics 2018; 33:1647-1655. [PMID: 28130241 PMCID: PMC5447233 DOI: 10.1093/bioinformatics/btx041] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 01/20/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Mutations (or Single Nucleotide Variants) in folded RiboNucleic Acid structures that cause local or global conformational change are riboSNitches. Predicting riboSNitches is challenging, as it requires making two, albeit related, structure predictions. The data most often used to experimentally validate riboSNitch predictions is Selective 2' Hydroxyl Acylation by Primer Extension, or SHAPE. Experimentally establishing a riboSNitch requires the quantitative comparison of two SHAPE traces: wild-type (WT) and mutant. Historically, SHAPE data was collected on electropherograms and change in structure was evaluated by 'gel gazing.' SHAPE data is now routinely collected with next generation sequencing and/or capillary sequencers. We aim to establish a classifier capable of simulating human 'gazing' by identifying features of the SHAPE profile that human experts agree 'looks' like a riboSNitch. Results We find strong quantitative agreement between experts when RNA scientists 'gaze' at SHAPE data and identify riboSNitches. We identify dynamic time warping and seven other features predictive of the human consensus. The classSNitch classifier reported here accurately reproduces human consensus for 167 mutant/WT comparisons with an Area Under the Curve (AUC) above 0.8. When we analyze 2019 mutant traces for 17 different RNAs, we find that features of the WT SHAPE reactivity allow us to improve thermodynamic structure predictions of riboSNitches. This is significant, as accurate RNA structural analysis and prediction is likely to become an important aspect of precision medicine. Availability and Implementation The classSNitch R package is freely available at http://classsnitch.r-forge.r-project.org . Contact alain@email.unc.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chanin Tolson Woods
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.,Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
7
|
Abstract
The generation of large-scale biomedical data is creating unprecedented opportunities for basic and translational science. Typically, the data producers perform initial analyses, but it is very likely that the most informative methods may reside with other groups. Crowdsourcing the analysis of complex and massive data has emerged as a framework to find robust methodologies. When the crowdsourcing is done in the form of collaborative scientific competitions, known as Challenges, the validation of the methods is inherently addressed. Challenges also encourage open innovation, create collaborative communities to solve diverse and important biomedical problems, and foster the creation and dissemination of well-curated data repositories.
Collapse
|
8
|
Moustafa K. Contributorships Are Not 'Weighable' to be Equal. Trends Biochem Sci 2016; 41:389-390. [PMID: 27025412 DOI: 10.1016/j.tibs.2016.03.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2016] [Revised: 03/05/2016] [Accepted: 03/07/2016] [Indexed: 10/22/2022]
Abstract
A new trend to assign some authors as 'first co-authors' is noticeable in scientific publications as a statement highlighting that two or more authors 'contributed equally' to a reported work. However, the requirements of scientific rigor, honesty, and accuracy in academic standards make such statements invalid and, thus, should be avoided. A potential solution is to specify the role of each co-author, from study conception to communication of results, and let readers judge the importance of each contribution by themselves. Alternatively, authors should demonstrate how they contributed 'equally' when they are defined as 'equal contributors'.
Collapse
|
9
|
Anderson-Lee J, Fisker E, Kosaraju V, Wu M, Kong J, Lee J, Lee M, Zada M, Treuille A, Das R. Principles for Predicting RNA Secondary Structure Design Difficulty. J Mol Biol 2016; 428:748-757. [PMID: 26902426 PMCID: PMC4833017 DOI: 10.1016/j.jmb.2015.11.013] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2015] [Revised: 11/04/2015] [Accepted: 11/10/2015] [Indexed: 11/27/2022]
Abstract
Designing RNAs that form specific secondary structures is enabling better understanding and control of living systems through RNA-guided silencing, genome editing and protein organization. Little is known, however, about which RNA secondary structures might be tractable for downstream sequence design, increasing the time and expense of design efforts due to inefficient secondary structure choices. Here, we present insights into specific structural features that increase the difficulty of finding sequences that fold into a target RNA secondary structure, summarizing the design efforts of tens of thousands of human participants and three automated algorithms (RNAInverse, INFO-RNA and RNA-SSD) in the Eterna massive open laboratory. Subsequent tests through three independent RNA design algorithms (NUPACK, DSS-Opt and MODENA) confirmed the hypothesized importance of several features in determining design difficulty, including sequence length, mean stem length, symmetry and specific difficult-to-design motifs such as zigzags. Based on these results, we have compiled an Eterna100 benchmark of 100 secondary structure design challenges that span a large range in design difficulty to help test future efforts. Our in silico results suggest new routes for improving computational RNA design methods and for extending these insights to assess "designability" of single RNA structures, as well as of switches for in vitro and in vivo applications.
Collapse
Affiliation(s)
| | | | - Vineet Kosaraju
- Eterna Massive Open Laboratory; Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
| | - Michelle Wu
- Eterna Massive Open Laboratory; Program in Biomedical Informatics, Stanford University, Stanford, CA 94305, USA
| | - Justin Kong
- Eterna Massive Open Laboratory; Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Jeehyung Lee
- Eterna Massive Open Laboratory; Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Minjae Lee
- Eterna Massive Open Laboratory; Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | - Adrien Treuille
- Eterna Massive Open Laboratory; Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Rhiju Das
- Eterna Massive Open Laboratory; Department of Biochemistry, Stanford University, Stanford, CA 94305, USA; Department of Physics, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
10
|
Khare R, Good BM, Leaman R, Su AI, Lu Z. Crowdsourcing in biomedicine: challenges and opportunities. Brief Bioinform 2015; 17:23-32. [PMID: 25888696 DOI: 10.1093/bib/bbv021] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The use of crowdsourcing to solve important but complex problems in biomedical and clinical sciences is growing and encompasses a wide variety of approaches. The crowd is diverse and includes online marketplace workers, health information seekers, science enthusiasts and domain experts. In this article, we review and highlight recent studies that use crowdsourcing to advance biomedicine. We classify these studies into two broad categories: (i) mining big data generated from a crowd (e.g. search logs) and (ii) active crowdsourcing via specific technical platforms, e.g. labor markets, wikis, scientific games and community challenges. Through describing each study in detail, we demonstrate the applicability of different methods in a variety of domains in biomedical research, including genomics, biocuration and clinical research. Furthermore, we discuss and highlight the strengths and limitations of different crowdsourcing platforms. Finally, we identify important emerging trends, opportunities and remaining challenges for future crowdsourcing research in biomedicine.
Collapse
|