1
|
Venugopal V, Olivetti E. MatKG: An autonomously generated knowledge graph in Material Science. Sci Data 2024; 11:217. [PMID: 38368452 PMCID: PMC10874416 DOI: 10.1038/s41597-024-03039-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 02/01/2024] [Indexed: 02/19/2024] Open
Abstract
In this paper, we present MatKG, a knowledge graph in materials science that offers a repository of entities and relationships extracted from scientific literature. Using advanced natural language processing techniques, MatKG includes an array of entities, including materials, properties, applications, characterization and synthesis methods, descriptors, and symmetry phase labels. The graph is formulated based on statistical metrics, encompassing over 70,000 entities and 5.4 million unique triples. To enhance accessibility and utility, we have serialized MatKG in both CSV and RDF formats and made these, along with the code base, available to the research community. As the largest knowledge graph in materials science to date, MatKG provides structured organization of domain-specific data. Its deployment holds promise for various applications, including material discovery, recommendation systems, and advanced analytics.
Collapse
Affiliation(s)
- Vineeth Venugopal
- Massachusetts Institute of Technology (MIT), Department of Material Science and Engineering, Boston, 02139, USA.
| | - Elsa Olivetti
- Massachusetts Institute of Technology (MIT), Department of Material Science and Engineering, Boston, 02139, USA.
| |
Collapse
|
2
|
Choi J, Lee B. Quantitative Topic Analysis of Materials Science Literature Using Natural Language Processing. ACS APPLIED MATERIALS & INTERFACES 2024; 16:1957-1968. [PMID: 38059688 DOI: 10.1021/acsami.3c12301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/08/2023]
Abstract
Materials science research has garnered extensive attention from industry, society, policy, and academia. However, understanding the research landscape and extracting strategic insights are challenging due to the increasing diversity and volume of publications. This study proposes a natural language processing-based protocol for extracting text-encoded topics from a large volume of scientific literature, uncovering research interests of scientific communities, as well as convergence trends. We report a topic map, representing the materials science research landscape with text-mined 257 topics regarding biocompatible materials, structural materials, electrochemistry, or photonics. We analyze the topic map in terms of national research interests in materials science, revealing competitive positions and strategies of active nations. For example, it is found that the increasing trend of research interest in machine learning topic was captured in the United States earlier than other nations. Similarly, our journal-level analyses serve as reference information for journal recommendations and trend guidance, showing that the main topics and research interests of materials science journals slightly changed over time. Moreover, we build the topic association network which can highlight the status and future potential of interdisciplinary research, revealing research fields with high centrality in the network such as machine learning-enabled composite modeling, energy policy, or wearable electronics. This study offers insightful results on current and near-future materials science research landscapes, facilitating the understanding of stakeholders, amidst the fast-evolving and diverse knowledge of materials science.
Collapse
Affiliation(s)
- Jaewoong Choi
- Computational Science Research Center, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea
| | - Byungju Lee
- Computational Science Research Center, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea
| |
Collapse
|
3
|
Lee J, Lee W, Kim J. MatGD: Materials Graph Digitizer. ACS APPLIED MATERIALS & INTERFACES 2024; 16:723-730. [PMID: 38147629 DOI: 10.1021/acsami.3c14781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
We developed Material Graph Digitizer (MatGD), which is a tool for digitizing a data line from scientific graphs. The algorithm behind the tool consists of four steps: (1) identifying graphs within subfigures, (2) separating axes and data sections, (3) discerning the data lines by eliminating irrelevant graph objects and matching with the legend, and (4) data extraction and saving. From the 62,534 papers in the areas of batteries, catalysis, and metal-organic frameworks (MOFs), 501,045 figures were mined. Remarkably, our tool showcased performance with over 99% accuracy in legend marker and text detection. Moreover, its capability for data line separation stood at 66%, which is much higher compared to those of other existing figure-mining tools. We believe that this tool will be integral to collecting both past and future data from publications, and these data can be used to train various machine learning models that can enhance material predictions and new materials discovery.
Collapse
Affiliation(s)
- Jaewoong Lee
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Wonseok Lee
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| | - Jihan Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
| |
Collapse
|
4
|
Schwenker E, Jiang W, Spreadbury T, Ferrier N, Cossairt O, Chan MK. EXSCLAIM!: Harnessing materials science literature for self-labeled microscopy datasets. PATTERNS (NEW YORK, N.Y.) 2023; 4:100843. [PMID: 38035197 PMCID: PMC10682750 DOI: 10.1016/j.patter.2023.100843] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 01/31/2023] [Accepted: 08/21/2023] [Indexed: 12/02/2023]
Abstract
This work introduces the EXSCLAIM! toolkit for the automatic extraction, separation, and caption-based natural language annotation of images from scientific literature. EXSCLAIM! is used to show how rule-based natural language processing and image recognition can be leveraged to construct an electron microscopy dataset containing thousands of keyword-annotated nanostructure images. Moreover, it is demonstrated how a combination of statistical topic modeling and semantic word similarity comparisons can be used to increase the number and variety of keyword annotations on top of the standard annotations from EXSCLAIM! With large-scale imaging datasets constructed from scientific literature, users are well positioned to train neural networks for classification and recognition tasks specific to microscopy-tasks often otherwise inhibited by a lack of sufficient annotated training data.
Collapse
Affiliation(s)
- Eric Schwenker
- Center for Nanoscale Materials, Argonne National Laboratory, Argonne, IL 60439, USA
- Department of Materials Science and Engineering, Northwestern University, Evanston, IL 60208, USA
| | - Weixin Jiang
- Center for Nanoscale Materials, Argonne National Laboratory, Argonne, IL 60439, USA
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
| | - Trevor Spreadbury
- Center for Nanoscale Materials, Argonne National Laboratory, Argonne, IL 60439, USA
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
| | - Nicola Ferrier
- Mathematics and Computer Science, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Oliver Cossairt
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
| | - Maria K.Y. Chan
- Center for Nanoscale Materials, Argonne National Laboratory, Argonne, IL 60439, USA
| |
Collapse
|
5
|
Brito ACM, Oliveira MCF, Oliveira ON, Silva FN, Amancio DR. Network Analysis and Natural Language Processing to Obtain a Landscape of the Scientific Literature on Materials Applications. ACS APPLIED MATERIALS & INTERFACES 2023. [PMID: 37270838 DOI: 10.1021/acsami.3c01632] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Recent progress in natural language processing (NLP) enables mining the literature in various tasks akin to knowledge discovery. Obtaining an updated birds-eye view of key research topics and their evolution in a vast, dynamic field such as materials science is challenging even for experienced researchers. In this Perspective paper, we present a landscape of the area of applied materials in selected representative journals based on a combination of methods from network science and simple NLP strategies. We found a predominance of energy-related materials, e.g., for batteries and catalysis, organic electronics, which include flexible sensors and flexible electronics, and nanomedicine with various topics of materials used in diagnosis and therapy. As for the impact calculated through standard metrics of impact factor, energy-related materials and organic electronics are again top of the list across different journals, while work in nanomedicine has been found to have a lower impact in the journals analyzed. The adequacy of the approach to identify key research topics in materials applications was verified indirectly by comparing the topics identified in journals with diverse scopes, including journals that are not specific to materials. The approach can be employed to obtain a fast overview of a given field from the papers published in related scientific journals, which can be adapted or extended to any research area.
Collapse
Affiliation(s)
- Ana Caroline M Brito
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13560-970, Brazil
| | - Maria Cristina F Oliveira
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13560-970, Brazil
| | - Osvaldo N Oliveira
- São Carlos Institute of Physics, University of São Paulo, Sao Carlos, São Paulo 13560-970, Brazil
| | - Filipi N Silva
- Indiana University Network Science Institute, Bloomington, Indiana 47408, United States
| | - Diego R Amancio
- Institute of Mathematical Sciences and Computing, University of São Paulo, São Carlos, São Paulo 13566-590, Brazil
| |
Collapse
|
6
|
Chen X, Ye P, Huang L, Wang C, Cai Y, Deng L, Ren H. Exploring science-technology linkages: A deep learning-empowered solution. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
7
|
Abstract
A great number of scientific papers are published every year in the field of battery research, which forms a huge textual data source. However, it is difficult to explore and retrieve useful information efficiently from these large unstructured sets of text. The Bidirectional Encoder Representations from Transformers (BERT) model, trained on a large data set in an unsupervised way, provides a route to process the scientific text automatically with minimal human effort. To this end, we realized six battery-related BERT models, namely, BatteryBERT, BatteryOnlyBERT, and BatterySciBERT, each of which consists of both cased and uncased models. They have been trained specifically on a corpus of battery research papers. The pretrained BatteryBERT models were then fine-tuned on downstream tasks, including battery paper classification and extractive question-answering for battery device component classification that distinguishes anode, cathode, and electrolyte materials. Our BatteryBERT models were found to outperform the original BERT models on the specific battery tasks. The fine-tuned BatteryBERT was then used to perform battery database enhancement. We also provide a website application for its interactive use and visualization.
Collapse
Affiliation(s)
- Shu Huang
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, U.K.
| | - Jacqueline M. Cole
- Cavendish
Laboratory, Department of Physics, University
of Cambridge, J.J. Thomson Avenue, Cambridge CB3 0HE, U.K.,ISIS
Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.,
| |
Collapse
|
8
|
Kumar A, Ganesh S, Gupta D, Kodamana H. A text mining framework for screening catalysts and critical process parameters from scientific literature - A study on Hydrogen production from alcohol. Chem Eng Res Des 2022. [DOI: 10.1016/j.cherd.2022.05.018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
9
|
Wondraczek L, Bouchbinder E, Ehrlicher A, Mauro JC, Sajzew R, Smedskjaer MM. Advancing the Mechanical Performance of Glasses: Perspectives and Challenges. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2022; 34:e2109029. [PMID: 34870862 DOI: 10.1002/adma.202109029] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 11/29/2021] [Indexed: 06/13/2023]
Abstract
Glasses are materials that lack a crystalline microstructure and long-range atomic order. Instead, they feature heterogeneity and disorder on superstructural scales, which have profound consequences for their elastic response, material strength, fracture toughness, and the characteristics of dynamic fracture. These structure-property relations present a rich field of study in fundamental glass physics and are also becoming increasingly important in the design of modern materials with improved mechanical performance. A first step in this direction involves glass-like materials that retain optical transparency and the haptics of classical glass products, while overcoming the limitations of brittleness. Among these, novel types of oxide glasses, hybrid glasses, phase-separated glasses, and bioinspired glass-polymer composites hold significant promise. Such materials are designed from the bottom-up, building on structure-property relations, modeling of stresses and strains at relevant length scales, and machine learning predictions. Their fabrication requires a more scientifically driven approach to materials design and processing, building on the physics of structural disorder and its consequences for structural rearrangements, defect initiation, and dynamic fracture in response to mechanical load. In this article, a perspective is provided on this highly interdisciplinary field of research in terms of its most recent challenges and opportunities.
Collapse
Affiliation(s)
- Lothar Wondraczek
- Otto Schott Institute of Materials Research, Friedrich Schiller University Jena, Fraunhoferstrasse 6, 07743, Jena, Germany
- Center of Energy and Environmental Chemistry Jena (CEEC Jena), Friedrich Schiller University Jena, Philosophenweg 7, 07743, Jena, Germany
| | - Eran Bouchbinder
- Chemical and Biological Physics Department, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Allen Ehrlicher
- Department of Bioengineering, McGill University, Montreal, H3A 2A7, Canada
| | - John C Mauro
- Department of Materials Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Roman Sajzew
- Otto Schott Institute of Materials Research, Friedrich Schiller University Jena, Fraunhoferstrasse 6, 07743, Jena, Germany
| | - Morten M Smedskjaer
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, 9220, Denmark
| |
Collapse
|