1
|
Wagner R, Becker MM, Balazinski M, Schnabel U, Below H, Yordanova K, Waltemath D. LAMAS 4 IC - Laboratory approved metadata acquisition schemas for ion chromatography. J Chromatogr B Analyt Technol Biomed Life Sci 2025; 1256:124556. [PMID: 40112686 DOI: 10.1016/j.jchromb.2025.124556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Revised: 03/04/2025] [Accepted: 03/04/2025] [Indexed: 03/22/2025]
Abstract
Metadata are necessary to describe scientific experiments. The use of metadata schemas enable the collection of structured and standardized metadata, supporting the findability, accessibility, interoperability and reusability of the data and metadata in accordance with the FAIR data principles. This paper introduces the first metadata schemas for the analytical methods of anion and cation ion-exchange chromatography measurements. The developed schemas are aligned with the ASTM E1151:1993 norm defining terms and relationships in ion chromatography. They are implemented by using the common JSON schema standard and publicly shared for direct use and further refinement for various application fields of IC measurements. Approval of the schema in laboratory environments was achieved in the field of applied plasma science and two examples from different use cases of water analysis for plasma applications are represented. Through practical application, the introduced schemas have demonstrated their effectiveness in laboratory environments, marking a step forward in standardizing the documentation of IC measurements to support advanced research data management and the application of modern data science methods.
Collapse
Affiliation(s)
- Robert Wagner
- Leibniz Institute for Plasma Science and Technology (INP), Greifswald, Germany.
| | - Markus M Becker
- Leibniz Institute for Plasma Science and Technology (INP), Greifswald, Germany
| | - Martina Balazinski
- Leibniz Institute for Plasma Science and Technology (INP), Greifswald, Germany
| | - Uta Schnabel
- Leibniz Institute for Plasma Science and Technology (INP), Greifswald, Germany
| | | | - Kristina Yordanova
- University of Greifswald, Institute for Data Science, Greifswald, Germany
| | - Dagmar Waltemath
- University Medicine Greifswald, Medical Informatics Laboratory, Greifswald, Germany
| |
Collapse
|
2
|
Fuster-Calvo A, Valentin S, Tamayo WC, Gravel D. Evaluating the feasibility of automating dataset retrieval for biodiversity monitoring. PeerJ 2025; 13:e18853. [PMID: 39897501 PMCID: PMC11786708 DOI: 10.7717/peerj.18853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 12/20/2024] [Indexed: 02/04/2025] Open
Abstract
Aim Effective management strategies for conserving biodiversity and mitigating the impacts of global change rely on access to comprehensive and up-to-date biodiversity data. However, manual search, retrieval, evaluation, and integration of this information into databases present a significant challenge to keeping pace with the rapid influx of large amounts of data, hindering its utility in contemporary decision-making processes. Automating these tasks through advanced algorithms holds immense potential to revolutionize biodiversity monitoring. Innovation In this study, we investigate the potential for automating the retrieval and evaluation of biodiversity data from Dryad and Zenodo repositories. We have designed an evaluation system based on various criteria, including the type of data provided and its spatio-temporal range, and applied it to manually assess the relevance for biodiversity monitoring of datasets retrieved through an application programming interface (API). We evaluated a supervised classification to identify potentially relevant datasets and investigate the feasibility of automatically ranking the relevance. Additionally, we applied the same appraoch on a scientific literature source, using data from Semantic Scholar for reference. Our evaluation centers on the database utilized by a national biodiversity monitoring system in Quebec, Canada. Main conclusions We retrieved 89 (55%) relevant datasets for our database, showing the value of automated dataset search in repositories. Additionally, we find that scientific publication sources offer broader temporal coverage and can serve as conduits guiding researchers toward other valuable data sources. Our automated classification system showed moderate performance in detecting relevant datasets (with an F-score up to 0.68) and signs of overfitting, emphasizing the need for further refinement. A key challenge identified in our manual evaluation is the scarcity and uneven distribution of metadata in the texts, especially pertaining to spatial and temporal extents. Our evaluative framework, based on predefined criteria, can be adopted by automated algorithms for streamlined prioritization, and we make our manually evaluated data publicly available, serving as a benchmark for improving classification techniques.
Collapse
Affiliation(s)
| | - Sarah Valentin
- Joint Research Unit Land, Remote Sensing and Spatial Information (UMR TETIS), French Agricultural Research Centre for International Development (CIRAD), Montpellier, France
| | - William C. Tamayo
- Biology Department, University of Sherbrooke, Sherbrooke, Quebec, Canada
| | - Dominique Gravel
- Biology Department, University of Sherbrooke, Sherbrooke, Quebec, Canada
| |
Collapse
|
3
|
Li Q, Liu C, Hou J, Wang P. Affective memories and perceived value: motivators and inhibitors of the data search-access process. JOURNAL OF DOCUMENTATION 2023. [DOI: 10.1108/jd-06-2022-0129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/15/2023]
Abstract
PurposeAs an emerging tool for data discovery, data retrieval systems fail to effectively support users' cognitive processes during data search and access. To uncover the relationship between data search and access and the cognitive mechanisms underlying this relationship, this paper examines the associations between affective memories, perceived value, search effort and the intention to access data during users' interactions with data retrieval systems.Design/methodology/approachThis study conducted a user experiment for which 48 doctoral students from different disciplines were recruited. The authors collected search logs, screen recordings, questionnaires and eye movement data during the interactive data search. Multiple linear regression was used to test the hypotheses.FindingsThe results indicate that positive affective memories positively affect perceived value, while the effects of negative affective memories on perceived value are nonsignificant. Utility value positively affects search effort, while attainment value negatively affects search effort. Moreover, search effort partially positively affects the intention to access data, and it serves a full mediating role in the effects of utility value and attainment value on the intention to access data.Originality/valueThrough the comparison between the findings of this study and relevant findings in information search studies, this paper reveals the specificity of behaviour and cognitive processes during data search and access and the special characteristics of data discovery tasks. It sheds light on the inhibiting effect of attainment value and the motivating effect of utility value on data search and the intention to access data. Moreover, this paper provides new insights into the role of memory bias in the relationships between affective memories and data searchers' perceived value.
Collapse
|
4
|
Wang P, Yang H, Hou J, Li Q. A machine learning approach to primacy-peak-recency effect-based satisfaction prediction. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
5
|
Peters K, Blatt-Janmaat KL, Tkach N, van Dam NM, Neumann S. Untargeted Metabolomics for Integrative Taxonomy: Metabolomics, DNA Marker-Based Sequencing, and Phenotype Bioimaging. PLANTS (BASEL, SWITZERLAND) 2023; 12:881. [PMID: 36840229 PMCID: PMC9965764 DOI: 10.3390/plants12040881] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 02/07/2023] [Accepted: 02/10/2023] [Indexed: 06/18/2023]
Abstract
Integrative taxonomy is a fundamental part of biodiversity and combines traditional morphology with additional methods such as DNA sequencing or biochemistry. Here, we aim to establish untargeted metabolomics for use in chemotaxonomy. We used three thallose liverwort species Riccia glauca, R. sorocarpa, and R. warnstorfii (order Marchantiales, Ricciaceae) with Lunularia cruciata (order Marchantiales, Lunulariacea) as an outgroup. Liquid chromatography high-resolution mass-spectrometry (UPLC/ESI-QTOF-MS) with data-dependent acquisition (DDA-MS) were integrated with DNA marker-based sequencing of the trnL-trnF region and high-resolution bioimaging. Our untargeted chemotaxonomy methodology enables us to distinguish taxa based on chemophenetic markers at different levels of complexity: (1) molecules, (2) compound classes, (3) compound superclasses, and (4) molecular descriptors. For the investigated Riccia species, we identified 71 chemophenetic markers at the molecular level, a characteristic composition in 21 compound classes, and 21 molecular descriptors largely indicating electron state, presence of chemical motifs, and hydrogen bonds. Our untargeted approach revealed many chemophenetic markers at different complexity levels that can provide more mechanistic insight into phylogenetic delimitation of species within a clade than genetic-based methods coupled with traditional morphology-based information. However, analytical and bioinformatics analysis methods still need to be better integrated to link the chemophenetic information at multiple scales.
Collapse
Affiliation(s)
- Kristian Peters
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstrasse 4, 04103 Leipzig, Germany
- Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Am Kirchtor 1, 06108 Halle, Germany
- Bioinformatics and Scientific Data, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany
| | - Kaitlyn L. Blatt-Janmaat
- Bioinformatics and Scientific Data, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle, Germany
- Department of Chemistry, University of New Brunswick, Fredericton, NB E3B 5A3, Canada
| | - Natalia Tkach
- Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Am Kirchtor 1, 06108 Halle, Germany
| | - Nicole M. van Dam
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstrasse 4, 04103 Leipzig, Germany
- Institute of Biodiversity, Friedrich Schiller University Jena, Dornburgerstraße 159, 07743 Jena, Germany
- Plants Biotic Interactions, Leibniz Institute of Vegetable and Ornamental Crops (IGZ), Theodor-Echtermeyer-Weg 1, 14979 Großbeeren, Germany
| | - Steffen Neumann
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstrasse 4, 04103 Leipzig, Germany
- Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Am Kirchtor 1, 06108 Halle, Germany
| |
Collapse
|
6
|
Abdelmageed N, Löffler F, Feddoul L, Algergawy A, Samuel S, Gaikwad J, Kazem A, König-Ries B. BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain. Biodivers Data J 2022; 10:e89481. [PMID: 36761617 PMCID: PMC9836593 DOI: 10.3897/bdj.10.e89481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 09/07/2022] [Indexed: 11/12/2022] Open
Abstract
Background Biodiversity is the assortment of life on earth covering evolutionary, ecological, biological, and social forms. To preserve life in all its variety and richness, it is imperative to monitor the current state of biodiversity and its change over time and to understand the forces driving it. This need has resulted in numerous works being published in this field. With this, a large amount of textual data (publications) and metadata (e.g. dataset description) has been generated. To support the management and analysis of these data, two techniques from computer science are of interest, namely Named Entity Recognition (NER) and Relation Extraction (RE). While the former enables better content discovery and understanding, the latter fosters the analysis by detecting connections between entities and, thus, allows us to draw conclusions and answer relevant domain-specific questions. To automatically predict entities and their relations, machine/deep learning techniques could be used. The training and evaluation of those techniques require labelled corpora. New information In this paper, we present two gold-standard corpora for Named Entity Recognition (NER) and Relation Extraction (RE) generated from biodiversity datasets metadata and abstracts that can be used as evaluation benchmarks for the development of new computer-supported tools that require machine learning or deep learning techniques. These corpora are manually labelled and verified by biodiversity experts. In addition, we explain the detailed steps of constructing these datasets. Moreover, we demonstrate the underlying ontology for the classes and relations used to annotate such corpora.
Collapse
Affiliation(s)
- Nora Abdelmageed
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany,Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, GermanyMichael-Stifel-Center for Data-Driven and Simulation ScienceJenaGermany
| | - Felicitas Löffler
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany
| | - Leila Feddoul
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany
| | - Alsayed Algergawy
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany
| | - Sheeba Samuel
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany,Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, GermanyMichael-Stifel-Center for Data-Driven and Simulation ScienceJenaGermany
| | - Jitendra Gaikwad
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany
| | - Anahita Kazem
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany,German Center for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, GermanyGerman Center for Integrative Biodiversity Research (iDiv)Halle-Jena-LeipzigGermany
| | - Birgitta König-Ries
- Heinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, GermanyHeinz Nixdorf Chair for Distributed Information Systems, Department of Mathematics and Computer Science, Friedrich Schiller University JenaJenaGermany,Michael-Stifel-Center for Data-Driven and Simulation Science, Jena, GermanyMichael-Stifel-Center for Data-Driven and Simulation ScienceJenaGermany,German Center for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, GermanyGerman Center for Integrative Biodiversity Research (iDiv)Halle-Jena-LeipzigGermany
| |
Collapse
|
7
|
Peters K, König-Ries B. Reference bioimaging to assess the phenotypic trait diversity of bryophytes within the family Scapaniaceae. Sci Data 2022; 9:598. [PMID: 36195605 PMCID: PMC9532418 DOI: 10.1038/s41597-022-01691-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 09/08/2022] [Indexed: 11/18/2022] Open
Abstract
Macro- and microscopic images of organisms are pivotal in biodiversity research. Despite that bioimages have manifold applications such as assessing the diversity of form and function, FAIR bioimaging data in the context of biodiversity are still very scarce, especially for difficult taxonomic groups such as bryophytes. Here, we present a high-quality reference dataset containing macroscopic and bright-field microscopic images documenting various phenotypic characters of the species belonging to the liverwort family of Scapaniaceae occurring in Europe. To encourage data reuse in biodiversity and adjacent research areas, we annotated the imaging data with machine-actionable metadata using community-accepted semantics. Furthermore, raw imaging data are retained and any contextual image processing like multi-focus image fusion and stitching were documented to foster good scientific practices through source tracking and provenance. The information contained in the raw images are also of particular interest for machine learning and image segmentation used in bioinformatics and computational ecology. We expect that this richly annotated reference dataset will encourage future studies to follow our principles.
Collapse
Affiliation(s)
- Kristian Peters
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstraße 4, 04103, Leipzig, Germany.
- Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Am Kirchtor 1, 06108, Halle (Saale), Germany.
- Bioinformatics and Scientific Data, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120, Halle (Saale), Germany.
| | - Birgitta König-Ries
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstraße 4, 04103, Leipzig, Germany
- Heinz-Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University, Jena, Germany
- Michael Stifel Center Jena, Jena, Germany
| |
Collapse
|
8
|
Jiao C, Li K, Fang Z. Data sharing practices across knowledge domains: A dynamic examination of data availability statements in PLOS ONE publications. J Inf Sci 2022. [DOI: 10.1177/01655515221101830] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
As the importance of research data gradually grows in sciences, data sharing has come to be encouraged and even mandated by journals and funders in recent years. Following this trend, the data availability statement has been increasingly embraced by academic communities as a means of sharing research data as part of research articles. This article presents a quantitative study of which mechanisms and repositories are used to share research data in PLOS ONE articles. We offer a dynamic examination of this topic from the disciplinary and temporal perspectives based on all statements in English-language research articles published between 2014 and 2020 in the journal. We find a slow yet steady growth in the use of data repositories to share data over time, as opposed to sharing data in the article and/or supplementary materials; this indicates improved compliance with the journal’s data sharing policies. We also find that multidisciplinary data repositories have been increasingly used over time, whereas some disciplinary repositories show a decreasing trend. Our findings can help academic publishers and funders to improve their data sharing policies and serve as an important baseline dataset for future studies on data sharing activities.
Collapse
Affiliation(s)
- Chenyue Jiao
- School of Information Sciences, University of Illinois Urbana-Champaign, USA
| | - Kai Li
- School of Information Resource Management, Renmin University of China, China
| | - Zhichao Fang
- Centre for Science and Technology Studies, Leiden University, The Netherlands
| |
Collapse
|
9
|
Schapiro D, Yapp C, Sokolov A, Reynolds SM, Chen YA, Sudar D, Xie Y, Muhlich J, Arias-Camison R, Arena S, Taylor AJ, Nikolov M, Tyler M, Lin JR, Burlingame EA, Chang YH, Farhi SL, Thorsson V, Venkatamohan N, Drewes JL, Pe'er D, Gutman DA, Herrmann MD, Gehlenborg N, Bankhead P, Roland JT, Herndon JM, Snyder MP, Angelo M, Nolan G, Swedlow JR, Schultz N, Merrick DT, Mazzili SA, Cerami E, Rodig SJ, Santagata S, Sorger PK. MITI minimum information guidelines for highly multiplexed tissue images. Nat Methods 2022; 19:262-267. [PMID: 35277708 PMCID: PMC9009186 DOI: 10.1038/s41592-022-01415-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The imminent release of tissue atlases combining multi-channel microscopy with single cell sequencing and other omics data from normal and diseased specimens creates an urgent need for data and metadata standards that guide data deposition, curation and release. We describe a Minimum Information about highly multiplexed Tissue Imaging (MITI) standard that applies best practices developed for genomics and other microscopy data to highly multiplexed tissue images and traditional histology.
Collapse
Affiliation(s)
- Denis Schapiro
- Laboratory of Systems Pharmacology, Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, USA
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Computational Biomedicine, Faculty of Medicine, Heidelberg University Hospital and Heidelberg University, Heidelberg, Germany
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Clarence Yapp
- Laboratory of Systems Pharmacology, Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, USA
- Image and Data Analysis Core, Harvard Medical School, Boston, MA, USA
| | - Artem Sokolov
- Laboratory of Systems Pharmacology, Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | | - Yu-An Chen
- Laboratory of Systems Pharmacology, Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, USA
| | - Damir Sudar
- Quantitative Imaging Systems LLC, Portland, OR, USA
| | - Yubin Xie
- Program in Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Jeremy Muhlich
- Laboratory of Systems Pharmacology, Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, USA
| | - Raquel Arias-Camison
- Laboratory of Systems Pharmacology, Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, USA
| | - Sarah Arena
- Laboratory of Systems Pharmacology, Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, USA
| | | | | | - Madison Tyler
- Laboratory of Systems Pharmacology, Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, USA
| | - Jia-Ren Lin
- Laboratory of Systems Pharmacology, Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, USA
| | - Erik A Burlingame
- Oregon Health and Science University, Portland, OR, USA
- Indica Labs, Albuquerque, NM, USA
| | - Young H Chang
- Oregon Health and Science University, Portland, OR, USA
| | - Samouil L Farhi
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Julia L Drewes
- Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Dana Pe'er
- Program in Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | | - Markus D Herrmann
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Peter Bankhead
- Edinburgh Pathology, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Joseph T Roland
- Vanderbilt University School of Medicine, Nashville, TN, USA
| | - John M Herndon
- Department of Surgery, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Michael Angelo
- School of Medicine, Stanford University, Stanford, CA, USA
| | - Garry Nolan
- School of Medicine, Stanford University, Stanford, CA, USA
| | - Jason R Swedlow
- Division of Computational Biology and Centre for Gene Regulation and Expression, University of Dundee, Dundee, UK
| | - Nikolaus Schultz
- Department of Epidemiology & Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | | | | | | - Scott J Rodig
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Sandro Santagata
- Laboratory of Systems Pharmacology, Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, USA.
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA.
| | - Peter K Sorger
- Laboratory of Systems Pharmacology, Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, USA.
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
10
|
Heger T, Zarrieß S, Algergawy A, Jeschke J, König-Ries B. INAS: Interactive Argumentation Support for the Scientific Domain of Invasion Biology. RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e80457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Developing a precise argument is not an easy task. In real-world argumentation scenarios, arguments presented in texts (e.g. scientific publications) often constitute the end result of a long and tedious process. A lot of work on computational argumentation has focused on analyzing and aggregating these products of argumentation processes, i.e. argumentative texts. In this project, we adopt a complementary perspective: we aim to develop an argumentation machine that supports users during the argumentation process in a scientific context, enabling them to follow ongoing argumentation in a scientific community and to develop their own arguments. To achieve this ambitious goal, we will focus on a particular phase of the scientific argumentation process, namely the initial phase of claim or hypothesis development. According to argumentation theory, the starting point of an argument is a claim, and also data that serves as a basis for the claim. In scientific argumentation, a carefully developed and thought-through hypothesis (which we see as Toulmin's "claim'' in a scientific context) is often crucial for researchers to be able to conduct a successful study and, in the end, present a new, high-quality finding or argument. Thus, an initial hypothesis needs to be specific enough that a researcher can test it based on data, but, at the same time, it should also relate to previous general claims made in the community. We investigate how argumentation machines can (i) represent concrete and more abstract knowledge on hypotheses and their underlying concepts, (ii) model the process of hypothesis refinement, including data as a basis of refinement, and (iii) interactively support a user in developing her own hypothesis based on these resources. This project will combine methods from different disciplines: natural language processing, knowledge representation and semantic web, philosophy of science and -- as an example for a scientific domain -- invasion biology. Our starting point is an existing resource in invasion biology that organizes and relates core hypotheses in the field and associates them to meta-data for more than 1000 scientific publications, which was developed over the course of several years based on manual analysis. This network, however, is currently static (i.e. needs substantial manual curation to be extended to incorporate new claims) and, moreover, is not easily accessible for users who miss specific background and domain knowledge in invasion biology. Our goal is to develop (i) a semantic model for representing knowledge on concepts and hypotheses, such that also non-expert users can use the network; (ii) a tool that automatically computes links from publication abstracts (and data) to these hypotheses; and (iii) an interactive system that supports users in refining their initial, potentially underdeveloped hypothesis.
Collapse
|
11
|
Lücking A, Driller C, Stoeckel M, Abrami G, Pachzelt A, Mehler A. Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology. LANG RESOUR EVAL 2021. [DOI: 10.1007/s10579-021-09553-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractBiodiversity information is contained in countless digitized and unprocessed scholarly texts. Although automated extraction of these data has been gaining momentum for years, there are still innumerable text sources that are poorly accessible and require a more advanced range of methods to extract relevant information. To improve the access to semantic biodiversity information, we have launched the BIOfid project (www.biofid.de) and have developed a portal to access the semantics of German language biodiversity texts, mainly from the 19th and 20th century. However, to make such a portal work, a couple of methods had to be developed or adapted first. In particular, text-technological information extraction methods were needed, which extract the required information from the texts. Such methods draw on machine learning techniques, which in turn are trained by learning data. To this end, among others, we gathered the bio text corpus, which is a cooperatively built resource, developed by biologists, text technologists, and linguists. A special feature of bio is its multiple annotation approach, which takes into account both general and biology-specific classifications, and by this means goes beyond previous, typically taxon- or ontology-driven proper name detection. We describe the design decisions and the genuine Annotation Hub Framework underlying the bio annotations and present agreement results. The tools used to create the annotations are introduced, and the use of the data in the semantic portal is described. Finally, some general lessons, in particular with multiple annotation projects, are drawn.
Collapse
|
12
|
Löffler F, Schuldt A, König-Ries B, Bruelheide H, Klan F. A Test Collection for Dataset Retrieval in Biodiversity Research. RESEARCH IDEAS AND OUTCOMES 2021. [DOI: 10.3897/rio.7.e67887] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Searching for scientific datasets is a prominent task in scholars' daily research practice. A variety of data publishers, archives and data portals offer search applications that allow the discovery of datasets. The evaluation of such dataset retrieval systems requires proper test collections, including questions that reflect real world information needs of scholars, a set of datasets and human judgements assessing the relevance of the datasets to the questions in the benchmark corpus. Unfortunately, only very few test collections exist for a dataset search. In this paper, we introduce the BEF-China test collection, the very first test collection for dataset retrieval in biodiversity research, a research field with an increasing demand in data discovery services. The test collection consists of 14 questions, a corpus of 372 datasets from the BEF-China project and binary relevance judgements provided by a biodiversity expert.
Collapse
|