Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Akhondi SA, Rey H, Schwörer M, Maier M, Toomey J, Nau H, Ilchmann G, Sheehan M, Irmer M, Bobach C, Doornenbal M, Gregory M, Kors JA. Automatic identification of relevant chemical compounds from patents. Database (Oxford) 2019;2019:5301319. [PMID: 30698776 PMCID: PMC6351730 DOI: 10.1093/database/baz001] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 12/16/2018] [Accepted: 12/28/2018] [Indexed: 12/29/2022]

For:	Akhondi SA, Rey H, Schwörer M, Maier M, Toomey J, Nau H, Ilchmann G, Sheehan M, Irmer M, Bobach C, Doornenbal M, Gregory M, Kors JA. Automatic identification of relevant chemical compounds from patents. Database (Oxford) 2019;2019:5301319. [PMID: 30698776 PMCID: PMC6351730 DOI: 10.1093/database/baz001] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 12/16/2018] [Accepted: 12/28/2018] [Indexed: 12/29/2022]

Number

Cited by Other Article(s)

Morin L, Weber V, Meijer GI, Yu F, Staar PWJ. PatCID: an open-access dataset of chemical structures in patent documents. Nat Commun 2024;15:6532. [PMID: 39095357 PMCID: PMC11297020 DOI: 10.1038/s41467-024-50779-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 07/19/2024] [Indexed: 08/04/2024] Open

Zhu TF, Qian R, Wei X, Lu AP, Cao DS. PatentNetML: A Novel Framework for Predicting Key Compounds in Patents Using Network Science and Machine Learning. J Med Chem 2024;67:1347-1359. [PMID: 38181431 DOI: 10.1021/acs.jmedchem.3c01893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024]

Machi K, Akiyama S, Nagata Y, Yoshioka M. OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles. J Chem Inf Model 2023;63:6619-6628. [PMID: 37859303 PMCID: PMC10647022 DOI: 10.1021/acs.jcim.3c01449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/05/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]

Vaškevičius M, Kapočiūtė-Dzikienė J, Vaškevičius A, Šlepikas L. Deep learning-based automatic action extraction from structured chemical synthesis procedures. PeerJ Comput Sci 2023;9:e1511. [PMID: 37705639 PMCID: PMC10495970 DOI: 10.7717/peerj-cs.1511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 07/07/2023] [Indexed: 09/15/2023]

Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem 2023 update. Nucleic Acids Res 2022;51:D1373-D1380. [PMID: 36305812 PMCID: PMC9825602 DOI: 10.1093/nar/gkac956] [Citation(s) in RCA: 1306] [Impact Index Per Article: 435.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/06/2022] [Accepted: 10/13/2022] [Indexed: 01/30/2023] Open

Affiliation(s)

Sunghwan Kim National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Jie Chen National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Tiejun Cheng National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Asta Gindulyte National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Jia He National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Siqian He National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Qingliang Li National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Benjamin A Shoemaker National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Paul A Thiessen National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Bo Yu National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Leonid Zaslavsky National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Jian Zhang National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
Evan E Bolton To whom correspondence should be addressed. Tel: +1 301 451 1811; Fax: +1 301 480 4559;

Collapse

Wang J, Shen Z, Liao Y, Yuan Z, Li S, He G, Lan M, Qian X, Zhang K, Li H. Multi-modal chemical information reconstruction from images and texts for exploring the near-drug space. Brief Bioinform 2022;23:6761958. [PMID: 36252922 PMCID: PMC9677486 DOI: 10.1093/bib/bbac461] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 09/21/2022] [Accepted: 09/26/2022] [Indexed: 12/14/2022] Open

Abstract

Identification of new chemical compounds with desired structural diversity and biological properties plays an essential role in drug discovery, yet the construction of such a potential space with elements of 'near-drug' properties is still a challenging task. In this work, we proposed a multimodal chemical information reconstruction system to automatically process, extract and align heterogeneous information from the text descriptions and structural images of chemical patents. Our key innovation lies in a heterogeneous data generator that produces cross-modality training data in the form of text descriptions and Markush structure images, from which a two-branch model with image- and text-processing units can then learn to both recognize heterogeneous chemical entities and simultaneously capture their correspondence. In particular, we have collected chemical structures from ChEMBL database and chemical patents from the European Patent Office and the US Patent and Trademark Office using keywords 'A61P, compound, structure' in the years from 2010 to 2020, and generated heterogeneous chemical information datasets with 210K structural images and 7818 annotated text snippets. Based on the reconstructed results and substituent replacement rules, structural libraries of a huge number of near-drug compounds can be generated automatically. In quantitative evaluations, our model can correctly reconstruct 97% of the molecular images into structured format and achieve an F1-score around 97-98% in the recognition of chemical entities, which demonstrated the effectiveness of our model in automatic information extraction from chemical patents, and hopefully transforming them to a user-friendly, structured molecular database enriching the near-drug space to realize the intelligent retrieval technology of chemical knowledge.

Collapse

Richman S, Lyman C, Nesterova A, Yuryev A, Morris M, Cao H, Cheadle C, Skuse G, Broderick G. Old drugs, new tricks: leveraging known compounds to disrupt coronavirus-induced cytokine storm. NPJ Syst Biol Appl 2022;8:38. [PMID: 36216820 PMCID: PMC9549818 DOI: 10.1038/s41540-022-00250-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 09/27/2022] [Indexed: 11/11/2022] Open

Wang J, Ren Y, Zhang Z, Xu H, Zhang Y. From Tokenization to Self-Supervision: Building a High-Performance Information Extraction System for Chemical Reactions in Patents. Front Res Metr Anal 2021;6:691105. [PMID: 35005421 PMCID: PMC8727901 DOI: 10.3389/frma.2021.691105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 11/02/2021] [Indexed: 11/28/2022] Open

Zhai Z, Druckenbrodt C, Thorne C, Akhondi SA, Nguyen DQ, Cohn T, Verspoor K. ChemTables: a dataset for semantic classification on tables in chemical patents. J Cheminform 2021;13:97. [PMID: 34895295 PMCID: PMC8665561 DOI: 10.1186/s13321-021-00568-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 11/06/2021] [Indexed: 11/10/2022] Open

Abstract

Chemical patents are a commonly used channel for disclosing novel compounds and reactions, and hence represent important resources for chemical and pharmaceutical research. Key chemical data in patents is often presented in tables. Both the number and the size of tables can be very large in patent documents. In addition, various types of information can be presented in tables in patents, including spectroscopic and physical data, or pharmacological use and effects of chemicals. Since images of Markush structures and merged cells are commonly used in these tables, their structure also shows substantial variation. This heterogeneity in content and structure of tables in chemical patents makes relevant information difficult to find. We therefore propose a new text mining task of automatically categorising tables in chemical patents based on their contents. Categorisation of tables based on the nature of their content can help to identify tables containing key information, improving the accessibility of information in patents that is highly relevant for new inventions. For developing and evaluating methods for the table classification task, we developed a new dataset, called CHEMTABLES, which consists of 788 chemical patent tables with labels of their content type. We introduce this data set in detail. We further establish strong baselines for the table classification task in chemical patents by applying state-of-the-art neural network models developed for natural language processing, including TabNet, ResNet and Table-BERT on CHEMTABLES. The best performing model, Table-BERT, achieves a performance of 88.66 micro-averaged [Formula: see text] score on the table classification task. The CHEMTABLES dataset is publicly available at https://doi.org/10.17632/g7tjh7tbrj.3 , subject to the CC BY NC 3.0 license. Code/models evaluated in this work are in a Github repository https://github.com/zenanz/ChemTables .

Collapse

Spreafico C, Spreafico M. Using text mining to retrieve information about circular economy. COMPUT IND 2021. [DOI: 10.1016/j.compind.2021.103525] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Ohms J. Current methodologies for chemical compound searching in patents: A case study. WORLD PATENT INFORMATION 2021. [DOI: 10.1016/j.wpi.2021.102055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Congenericity of Claimed Compounds in Patent Applications. Molecules 2021;26:molecules26175253. [PMID: 34500686 PMCID: PMC8433967 DOI: 10.3390/molecules26175253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 08/17/2021] [Accepted: 08/18/2021] [Indexed: 12/04/2022] Open

Falaguera MJ, Mestres J. Identification of the Core Chemical Structure in SureChEMBL Patents. J Chem Inf Model 2021;61:2241-2247. [PMID: 33929850 DOI: 10.1021/acs.jcim.1c00151] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Milman BL, Zhurkovich IK. Statistics of the Popularity of Chemical Compounds in Relation to the Non-Target Analysis. Molecules 2021;26:2394. [PMID: 33924131 PMCID: PMC8074313 DOI: 10.3390/molecules26082394] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 04/14/2021] [Accepted: 04/18/2021] [Indexed: 11/25/2022] Open

Islamaj R, Leaman R, Kim S, Kwon D, Wei CH, Comeau DC, Peng Y, Cissel D, Coss C, Fisher C, Guzman R, Kochar PG, Koppel S, Trinh D, Sekiya K, Ward J, Whitman D, Schmidt S, Lu Z. NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature. Sci Data 2021;8:91. [PMID: 33767203 PMCID: PMC7994842 DOI: 10.1038/s41597-021-00875-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Accepted: 01/19/2021] [Indexed: 11/13/2022] Open

Affiliation(s)

Rezarta Islamaj National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Robert Leaman National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Sun Kim National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Dongseop Kwon National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Chih-Hsuan Wei National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Donald C Comeau National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Yifan Peng National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
David Cissel National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Cathleen Coss National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Carol Fisher National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Rob Guzman National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Preeti Gokal Kochar National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Stella Koppel National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Dorothy Trinh National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Keiko Sekiya National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Janice Ward National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Deborah Whitman National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Susan Schmidt National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
Zhiyong Lu National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.

Collapse

He J, Nguyen DQ, Akhondi SA, Druckenbrodt C, Thorne C, Hoessel R, Afzal Z, Zhai Z, Fang B, Yoshikawa H, Albahem A, Cavedon L, Cohn T, Baldwin T, Verspoor K. ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents. Front Res Metr Anal 2021;6:654438. [PMID: 33870071 PMCID: PMC8028406 DOI: 10.3389/frma.2021.654438] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Accepted: 02/24/2021] [Indexed: 11/21/2022] Open

Drug repurposing patent documents vs peer review: patent information comes more than 600 days earlier on average. FUTURE DRUG DISCOVERY 2020. [DOI: 10.4155/fdd-2020-0001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Jose JM, Yilmaz E, Magalhães J, Castells P, Ferro N, Silva MJ, Martins F, Akhondi SA, Cohn T, Baldwin T, Verspoor K. ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents. ADVANCES IN INFORMATION RETRIEVAL 2020;12036. [PMCID: PMC7148043 DOI: 10.1007/978-3-030-45442-5_74] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]