1
|
He YQ, McDonough LK, Zainab SM, Guo ZF, Chen C, Xu YY. Microplastic accumulation in groundwater: Data-scaled insights and future research. WATER RESEARCH 2024; 258:121808. [PMID: 38796912 DOI: 10.1016/j.watres.2024.121808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 05/10/2024] [Accepted: 05/19/2024] [Indexed: 05/29/2024]
Abstract
Given that microplastics (MPs) in groundwater have been concerned for risks to humans and ecosystems with increased publications, a Contrasting Analysis of Scales (CAS) approach is developed by this study to synthesize all existing data into a hierarchical understanding of MP accumulation in groundwater. Within the full data of 386 compiled samples, the median abundance of MPs in Open Groundwater (OG) and Closed Groundwater (CG) were 4.4 and 2.5 items/L respectively, with OG exhibiting a greater diversity of MP colors and larger particle sizes. The different pathways of MP entry (i.e., surface runoff and rock interstices) into OG and CG led to this difference. At the regional scale, median MP abundance in nature reserves and landfills were 17.5 and 13.4 items/L, respectively, all the sampling points showed high pollution load risk. MPs in agricultural areas exhibited a high coefficient of variation (716.7%), and a median abundance of 1.0 items/L. Anthropogenic activities at the regional scale are the drivers behind the differentiation in the morphological characteristics of MPs, where groundwater in residential areas with highly toxic polymers (e.g., polyvinylchloride) deserves prolonged attention. At the local scale, the transport of MPs is controlled by groundwater flow paths, with a higher abundance of MP particles downstream than upstream, and MPs with regular surfaces and lower resistance (e.g., pellets) are more likely to be transported over long distances. From the data-scaled insight this study provides on the accumulation of MPs, future research should be directed towards network-based observation for groundwater-rich regions covered with landfills, residences, and agricultural land.
Collapse
Affiliation(s)
- Yu-Qin He
- Key Laboratory of Urban Environment and Health, Ningbo Observation and Research Station, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Liza K McDonough
- Australian Nuclear Science and Technology Organisation (ANSTO), New Illawarra Rd, Lucas Heights, NSW 2234, Australia
| | - Syeda Maria Zainab
- Key Laboratory of Urban Environment and Health, Ningbo Observation and Research Station, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China; Zhejiang Key Laboratory of Urban Environmental Processes and Pollution Control, CAS Haixi Industrial Technology Innovation Center in Beilun, Ningbo 315830, China
| | - Zhao-Feng Guo
- Key Laboratory of Urban Environment and Health, Ningbo Observation and Research Station, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China; Zhejiang Key Laboratory of Urban Environmental Processes and Pollution Control, CAS Haixi Industrial Technology Innovation Center in Beilun, Ningbo 315830, China
| | - Cai Chen
- Key Laboratory of Urban Environment and Health, Ningbo Observation and Research Station, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yao-Yang Xu
- Key Laboratory of Urban Environment and Health, Ningbo Observation and Research Station, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China; Zhejiang Key Laboratory of Urban Environmental Processes and Pollution Control, CAS Haixi Industrial Technology Innovation Center in Beilun, Ningbo 315830, China.
| |
Collapse
|
2
|
Tiemann JKS, Szczuka M, Bouarroudj L, Oussaren M, Garcia S, Howard RJ, Delemotte L, Lindahl E, Baaden M, Lindorff-Larsen K, Chavent M, Poulain P. MDverse: Shedding Light on the Dark Matter of Molecular Dynamics Simulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.02.538537. [PMID: 37205542 PMCID: PMC10187166 DOI: 10.1101/2023.05.02.538537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
The rise of open science and the absence of a global dedicated data repository for molecular dynamics (MD) simulations has led to the accumulation of MD files in generalist data repositories, constituting the dark matter of MD - data that is technically accessible, but neither indexed, curated, or easily searchable. Leveraging an original search strategy, we found and indexed about 250,000 files and 2,000 datasets from Zenodo, Figshare and Open Science Framework. With a focus on files produced by the Gromacs MD software, we illustrate the potential offered by the mining of publicly available MD data. We identified systems with specific molecular composition and were able to characterize essential parameters of MD simulation such as temperature and simulation length, and could identify model resolution, such as all-atom and coarse-grain. Based on this analysis, we inferred metadata to propose a search engine prototype to explore the MD data. To continue in this direction, we call on the community to pursue the effort of sharing MD data, and to report and standardize metadata to reuse this valuable matter.
Collapse
|
3
|
Emanuele E, Minoretti P. Measuring the Impact of Data Sharing: From Author-Level Metrics to Quantification of Economic and Non-tangible Benefits. Cureus 2023; 15:e50308. [PMID: 38205488 PMCID: PMC10777335 DOI: 10.7759/cureus.50308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/10/2023] [Indexed: 01/12/2024] Open
Abstract
In early 2023, the National Institutes of Health (NIH) implemented its Data Management and Sharing (DMS) Policy, requiring researchers to share scientific data produced with NIH funding. The policy's objective is to amplify the benefits of public investment in research by promoting the dissemination and reusability of primary data. Given this backdrop, identifying a robust methodology to assess the impact of data sharing across diverse research domains is essential. In this review, we adopted two methodological paradigms, the bottom-up and top-down strategies, and employed content analysis to pinpoint established methodologies and innovative practices within this intricate field. Although numerous author-level metrics are available to gauge the impact of data sharing, their application is still limited. Non-traditional metrics, encompassing economic (e.g., cost savings) and intangible benefits, presently appear to hold more potential for evaluating the impact of primary data sharing. Finally, we address the primary obstacles encountered by open data policies and introduce an innovative "Shared model for shared data" framework to bolster data sharing practices and refine evaluation metrics.
Collapse
|
4
|
Bayer JM, Scully RA, Dlabola EK, Courtwright JL, Hirsch CL, Hockman-Wert D, Miller SW, Roper BB, Saunders WC, Snyder MN. Sharing FAIR monitoring program data improves discoverability and reuse. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:1141. [PMID: 37665400 DOI: 10.1007/s10661-023-11788-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 08/24/2023] [Indexed: 09/05/2023]
Abstract
Data resulting from environmental monitoring programs are valuable assets for natural resource managers, decision-makers, and researchers. These data are often collected to inform specific reporting needs or decisions with a specific timeframe. While program-oriented data and related publications are effective for meeting program goals, sharing well-documented data and metadata allows users to research aspects outside initial program intentions. As part of an effort to integrate data from four long-term large-scale US aquatic monitoring programs, we evaluated the original datasets against the FAIR (Findable, Accessible, Interoperable, Reusable) data principles and offer recommendations and lessons learned. Differences in data governance across these programs resulted in considerable effort to access and reuse the original datasets. Requirements, guidance, and resources available to support data publishing and documentation are inconsistent across agencies and monitoring programs, resulting in various data formats and storage locations that are not easily found, accessed, or reused. Making monitoring data FAIR will reduce barriers to data discovery and reuse. Programs are continuously striving to improve data management, data products, and metadata; however, provision of related tools, consistent guidelines and standards, and more resources to do this work is needed. Given the value of these data and the significant effort required to access and reuse them, actions and steps intended on improving data documentation and accessibility are described.
Collapse
Affiliation(s)
- Jennifer M Bayer
- U.S. Geological Survey, Pacific Northwest Aquatic Monitoring Partnership, Cook, WA, 98605, USA.
| | - Rebecca A Scully
- U.S. Geological Survey, Pacific Northwest Aquatic Monitoring Partnership, Cook, WA, 98605, USA
| | - Erin K Dlabola
- U.S. Geological Survey, Forest and Rangeland Ecosystem Science Center, Corvallis, OR, 97331, USA
| | - Jennifer L Courtwright
- Watershed Sciences Department, College of Natural Resources, Utah State University, Logan, UT, 84322, USA
| | - Christine L Hirsch
- United States Forest Service, Pacific Northwest Research Station, Corvallis, OR, 97331, USA
| | - David Hockman-Wert
- United States Forest Service, Pacific Northwest Research Station, Corvallis, OR, 97331, USA
| | - Scott W Miller
- Bureau of Land Management, National Operations Center, Denver, CO, 80225, USA
| | - Brett B Roper
- United States Forest Service, National Stream and Aquatic Ecology Center, Logan, UT, 84332, USA
| | - W Carl Saunders
- PACFISH/INFISH Biological Opinion Monitoring Program, United States Forest Service, Logan, UT, 84332, USA
| | - Marcía N Snyder
- United States Forest Service, Pacific Northwest Research Station, Corvallis, OR, 97331, USA
| |
Collapse
|
5
|
Nault R, Cave MC, Ludewig G, Moseley HN, Pennell KG, Zacharewski T. A Case for Accelerating Standards to Achieve the FAIR Principles of Environmental Health Research Experimental Data. ENVIRONMENTAL HEALTH PERSPECTIVES 2023; 131:65001. [PMID: 37352010 PMCID: PMC10289218 DOI: 10.1289/ehp11484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/05/2023] [Accepted: 06/07/2023] [Indexed: 06/25/2023]
Abstract
BACKGROUND Funding agencies, publishers, and other stakeholders are pushing environmental health science investigators to improve data sharing; to promote the findable, accessible, interoperable, and reusable (FAIR) principles; and to increase the rigor and reproducibility of the data collected. Accomplishing these goals will require significant cultural shifts surrounding data management and strategies to develop robust and reliable resources that bridge the technical challenges and gaps in expertise. OBJECTIVE In this commentary, we examine the current state of managing data and metadata-referred to collectively as (meta)data-in the experimental environmental health sciences. We introduce new tools and resources based on in vivo experiments to serve as examples for the broader field. METHODS We discuss previous and ongoing efforts to improve (meta)data collection and curation. These include global efforts by the Functional Genomics Data Society to develop metadata collection tools such as the Investigation, Study, Assay (ISA) framework, and the Center for Expanded Data Annotation and Retrieval. We also conduct a case study of in vivo data deposited in the Gene Expression Omnibus that demonstrates the current state of in vivo environmental health data and highlights the value of using the tools we propose to support data deposition. DISCUSSION The environmental health science community has played a key role in efforts to achieve the goals of the FAIR guiding principles and is well positioned to advance them further. We present a proposed framework to further promote these objectives and minimize the obstacles between data producers and data scientists to maximize the return on research investments. https://doi.org/10.1289/EHP11484.
Collapse
Affiliation(s)
- Rance Nault
- Biochemistry & Molecular Biology Department, Institute for Integrative Toxicology, Michigan State University, East Lansing, Michigan, USA
| | - Matthew C. Cave
- Division of Gastroenterology, Hepatology, and Nutrition, University of Louisville, Louisville, Kentucky, USA
| | - Gabriele Ludewig
- Department of Occupational and Environmental Health, University of Iowa, Iowa City, Iowa, USA
| | - Hunter N.B. Moseley
- Molecular and Cellular Biochemistry Department, University of Kentucky, Lexington, Kentucky, USA
| | - Kelly G. Pennell
- Department of Civil Engineering, University of Kentucky, Lexington, Kentucky, USA
| | - Tim Zacharewski
- Biochemistry & Molecular Biology Department, Institute for Integrative Toxicology, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
6
|
Mayer Z, Kahn J, Götz M, Hou Y, Beiersdörfer T, Blumenröhr N, Volk R, Streit A, Schultmann F. Thermal Bridges on Building Rooftops. Sci Data 2023; 10:268. [PMID: 37164958 PMCID: PMC10171139 DOI: 10.1038/s41597-023-02140-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 04/06/2023] [Indexed: 05/12/2023] Open
Abstract
Thermal Bridges on Building Rooftops (TBBR) is a multi-channel remote sensing dataset. It was recorded during six separate UAV fly-overs of the city center of Karlsruhe, Germany, and comprises a total of 926 high-resolution images with 6927 manually-provided thermal bridge annotations. Each image provides five channels: three color, one thermographic, and one computationally derived height map channel. The data is pre-split into training and test data subsets suitable for object detection and instance segmentation tasks. All data is organized and structured to comply with FAIR principles, i.e. being findable, accessible, interoperable, and reusable. It is publicly available and can be downloaded from the Zenodo data repository. This work provides a comprehensive data descriptor for the TBBR dataset to facilitate broad community uptake.
Collapse
Affiliation(s)
- Zoe Mayer
- Karlsruhe Institute of Technology, Institute for Industrial Production, 76187, Karlsruhe, Germany.
| | - James Kahn
- Helmholtz AI, Karlsruhe, Germany
- Karlsruhe Institute of Technology, Steinbuch Centre for Computing, 76344, Eggenstein-Leopoldshafen, Germany
| | - Markus Götz
- Helmholtz AI, Karlsruhe, Germany.
- Karlsruhe Institute of Technology, Steinbuch Centre for Computing, 76344, Eggenstein-Leopoldshafen, Germany.
| | - Yu Hou
- Western New England University, Department of Construction Management, Springfield, MA, 01119, USA
- Carnegie Mellon University, Civil and Environmental Engineering Department, Pittsburgh, PA, 15213, USA
| | - Tobias Beiersdörfer
- Karlsruhe Institute of Technology, Institute for Industrial Production, 76187, Karlsruhe, Germany
| | - Nicolas Blumenröhr
- Karlsruhe Institute of Technology, Steinbuch Centre for Computing, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz Metadata Collaboration, Karlsruhe, Germany
| | - Rebekka Volk
- Karlsruhe Institute of Technology, Institute for Industrial Production, 76187, Karlsruhe, Germany.
| | - Achim Streit
- Karlsruhe Institute of Technology, Steinbuch Centre for Computing, 76344, Eggenstein-Leopoldshafen, Germany
| | - Frank Schultmann
- Karlsruhe Institute of Technology, Institute for Industrial Production, 76187, Karlsruhe, Germany
| |
Collapse
|
7
|
Tsueng G, Cano MAA, Bento J, Czech C, Kang M, Pache L, Rasmussen LV, Savidge TC, Starren J, Wu Q, Xin J, Yeaman MR, Zhou X, Su AI, Wu C, Brown L, Shabman RS, Hughes LD. Developing a standardized but extendable framework to increase the findability of infectious disease datasets. Sci Data 2023; 10:99. [PMID: 36823157 PMCID: PMC9950378 DOI: 10.1038/s41597-023-01968-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 01/13/2023] [Indexed: 02/25/2023] Open
Abstract
Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.
Collapse
Affiliation(s)
- Ginger Tsueng
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
| | - Marco A Alvarado Cano
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - José Bento
- Department of Computer Science, Boston College, 245 Beacon St, Chestnut Hill, MA, 02467, USA
| | - Candice Czech
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Mengjia Kang
- Division of Pulmonary and Critical Care, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Lars Pache
- Infectious and Inflammatory Disease Center, Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Tor C Savidge
- Texas Children's Microbiome Center & Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Justin Starren
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Qinglong Wu
- Texas Children's Microbiome Center & Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Jiwen Xin
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Michael R Yeaman
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Divisions of Molecular Medicine and Infectious Diseases, Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
- Lundquist Institute for Infection & Immunity at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
| | - Xinghua Zhou
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Andrew I Su
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Scripps Research Translational Institute, La Jolla, CA, 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Chunlei Wu
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Scripps Research Translational Institute, La Jolla, CA, 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Liliana Brown
- Office of Genomics and Advanced Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD, 20852, USA
| | - Reed S Shabman
- Office of Genomics and Advanced Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD, 20852, USA
| | - Laura D Hughes
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
| |
Collapse
|
8
|
Wood-Charlson EM. The Importance of Sharing Data in Systems Biology. Metabolites 2023; 13:metabo13010099. [PMID: 36677023 PMCID: PMC9866890 DOI: 10.3390/metabo13010099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 12/27/2022] [Accepted: 01/03/2023] [Indexed: 01/11/2023] Open
Abstract
Systems biology research spans a range of biological scales and science domains, and often requires a collaborative effort to collect and share data so that integration is possible. However, sharing data effectively is a challenging task that requires effort and alignment between collaborative partners, as well as coordination between organizations, repositories, and journals. As a community of systems biology researchers, we must get better at efficiently sharing data, and ensuring that shared data comes with the recognition and citations it deserves.
Collapse
Affiliation(s)
- Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, E.O. Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
9
|
The case for including microbial sequences in the electronic health record. Nat Med 2023; 29:22-25. [PMID: 36646805 DOI: 10.1038/s41591-022-02157-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
10
|
Modeling community standards for metadata as templates makes data FAIR. Sci Data 2022; 9:696. [DOI: 10.1038/s41597-022-01815-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 11/01/2022] [Indexed: 11/13/2022] Open
Abstract
AbstractIt is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be “rich” and to adhere to “domain-relevant” community standards. Scientific communities should be able to define their own machine-actionable templates for metadata that encode these “rich,” discipline-specific elements. We have explored this template-based approach in the context of two software systems. One system is the CEDAR Workbench, which investigators use to author new metadata. The other is the FAIRware Workbench, which evaluates the metadata of archived datasets for their adherence to community standards. Benefits accrue when templates for metadata become central elements in an ecosystem of tools to manage online datasets—both because the templates serve as a community reference for what constitutes FAIR data, and because they embody that perspective in a form that can be distributed among a variety of software applications to assist with data stewardship and data sharing.
Collapse
|