Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wei CH, Kao HY, Lu Z. GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains. Biomed Res Int 2015;2015:918710. [PMID: 26380306 DOI: 10.1155/2015/918710] [Citation(s) in RCA: 111] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Revised: 04/03/2015] [Accepted: 04/04/2015] [Indexed: 02/01/2023]

For:	Wei CH, Kao HY, Lu Z. GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains. Biomed Res Int 2015;2015:918710. [PMID: 26380306 DOI: 10.1155/2015/918710] [Citation(s) in RCA: 111] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Revised: 04/03/2015] [Accepted: 04/04/2015] [Indexed: 02/01/2023]

Number

Cited by Other Article(s)

Li Z, Wei Q, Huang LC, Li J, Hu Y, Chuang YS, He J, Das A, Keloth VK, Yang Y, Diala CS, Roberts KE, Tao C, Jiang X, Zheng WJ, Xu H. Ensemble pretrained language models to extract biomedical knowledge from literature. J Am Med Inform Assoc 2024:ocae061. [PMID: 38520725 DOI: 10.1093/jamia/ocae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 02/14/2024] [Accepted: 03/12/2024] [Indexed: 03/25/2024] Open

Affiliation(s)

Zhao Li McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Qiang Wei McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Liang-Chin Huang McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Jianfu Li McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Yan Hu McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Yao-Shun Chuang McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Jianping He McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Avisha Das McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Vipina Kuttichi Keloth Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, United States
Yuntao Yang McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Chiamaka S Diala McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Kirk E Roberts McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Cui Tao McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Xiaoqian Jiang McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
W Jim Zheng McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Hua Xu Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, United States

Collapse

Park YJ, Yang GJ, Sohn CB, Park SJ. GPDminer: a tool for extracting named entities and analyzing relations in biological literature. BMC Bioinformatics 2024;25:101. [PMID: 38448845 PMCID: PMC10916184 DOI: 10.1186/s12859-024-05710-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 02/19/2024] [Indexed: 03/08/2024] Open

Huang DL, Zeng Q, Xiong Y, Liu S, Pang C, Xia M, Fang T, Ma Y, Qiang C, Zhang Y, Zhang Y, Li H, Yuan Y. A Combined Manual Annotation and Deep-Learning Natural Language Processing Study on Accurate Entity Extraction in Hereditary Disease Related Biomedical Literature. Interdiscip Sci 2024:10.1007/s12539-024-00605-2. [PMID: 38340264 DOI: 10.1007/s12539-024-00605-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 02/12/2024]

Kilicoglu H, Ensan F, McInnes B, Wang LL. Semantics-enabled biomedical literature analytics. J Biomed Inform 2024;150:104588. [PMID: 38244957 DOI: 10.1016/j.jbi.2024.104588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 01/10/2024] [Indexed: 01/22/2024]

Wu Z, Feng C, Hu Y, Zhou Y, Li S, Zhang S, Hu Y, Chen Y, Chao H, Ni Q, Chen M. HALD, a human aging and longevity knowledge graph for precision gerontology and geroscience analyses. Sci Data 2023;10:851. [PMID: 38040715 PMCID: PMC10692171 DOI: 10.1038/s41597-023-02781-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 11/23/2023] [Indexed: 12/03/2023] Open

Kartchner D, Deng J, Lohiya S, Kopparthi T, Bathala P, Domingo-Fernández D, Mitchell CS. A Comprehensive Evaluation of Biomedical Entity Linking Models. PROCEEDINGS OF THE CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING. CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING 2023;2023:14462-14478. [PMID: 38756862 PMCID: PMC11097978 DOI: 10.18653/v1/2023.emnlp-main.893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2024]

Xu Q, Liu Y, Sun D, Huang X, Li F, Zhai J, Li Y, Zhou Q, Qian N, Niu B. OncoCTMiner: streamlining precision oncology trial matching via molecular profile analysis. Database (Oxford) 2023;2023:baad077. [PMID: 37935585 PMCID: PMC10630409 DOI: 10.1093/database/baad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 09/08/2023] [Accepted: 10/21/2023] [Indexed: 11/09/2023]

Affiliation(s)

Quan Xu Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
Yueyue Liu Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
Dawei Sun Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
Xiaoqian Huang Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
Feihong Li Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
JinCheng Zhai Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
Yang Li Beijing International Center for Mathematical Research, Peking University, No. 5 Yiheyuan Road Haidian District, Beijing 100871, China Chongqing Research Institute of Big Data, Peking University, Chongqing 401333, China
Qiming Zhou Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
Niansong Qian Department of Oncology, Senior Department of Respiratory and Critical Care Medicine, The Eighth Medical Center of Chinese PLA General Hospital, No.17 A Heishanhu Road, Haidian District, Beijing 100853, China
Beifang Niu Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100190, China

Collapse

Garda S, Weber-Genzel L, Martin R, Leser U. BELB: a biomedical entity linking benchmark. Bioinformatics 2023;39:btad698. [PMID: 37975879 PMCID: PMC10681865 DOI: 10.1093/bioinformatics/btad698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 10/30/2023] [Accepted: 11/16/2023] [Indexed: 11/19/2023] Open

Wei CH, Luo L, Islamaj R, Lai PT, Lu Z. GNorm2: an improved gene name recognition and normalization system. Bioinformatics 2023;39:btad599. [PMID: 37878810 PMCID: PMC10612401 DOI: 10.1093/bioinformatics/btad599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 09/06/2023] [Accepted: 10/23/2023] [Indexed: 10/27/2023] Open

Sosa DN, Hintzen R, Xiong B, de Giorgio A, Fauqueur J, Davies M, Lever J, Altman RB. Associating biological context with protein-protein interactions through text mining at PubMed scale. J Biomed Inform 2023;145:104474. [PMID: 37572825 DOI: 10.1016/j.jbi.2023.104474] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 08/03/2023] [Accepted: 08/05/2023] [Indexed: 08/14/2023]

Pu Y, Beck D, Verspoor K. Graph embedding-based link prediction for literature-based discovery in Alzheimer's Disease. J Biomed Inform 2023;145:104464. [PMID: 37541406 DOI: 10.1016/j.jbi.2023.104464] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Revised: 07/29/2023] [Accepted: 07/30/2023] [Indexed: 08/06/2023]

Abstract

OBJECTIVE

We explore the framing of literature-based discovery (LBD) as link prediction and graph embedding learning, with Alzheimer's Disease (AD) as our focus disease context. The key link prediction setting of prediction window length is specifically examined in the context of a time-sliced evaluation methodology.

METHODS

We propose a four-stage approach to explore literature-based discovery for Alzheimer's Disease, creating and analyzing a knowledge graph tailored to the AD context, and predicting and evaluating new knowledge based on time-sliced link prediction. The first stage is to collect an AD-specific corpus. The second stage involves constructing an AD knowledge graph with identified AD-specific concepts and relations from the corpus. In the third stage, 20 pairs of training and testing datasets are constructed with the time-slicing methodology. Finally, we infer new knowledge with graph embedding-based link prediction methods. We compare different link prediction methods in this context. The impact of limiting prediction evaluation of LBD models in the context of short-term and longer-term knowledge evolution for Alzheimer's Disease is assessed.

RESULTS

We constructed an AD corpus of over 16 k papers published in 1977-2021, and automatically annotated it with concepts and relations covering 11 AD-specific semantic entity types. The knowledge graph of Alzheimer's Disease derived from this resource consisted of ∼11 k nodes and ∼394 k edges, among which 34% were genotype-phenotype relationships, 57% were genotype-genotype relationships, and 9% were phenotype-phenotype relationships. A Structural Deep Network Embedding (SDNE) model consistently showed the best performance in terms of returning the most confident set of link predictions as time progresses over 20 years. A huge improvement in model performance was observed when changing the link prediction evaluation setting to consider a more distant future, reflecting the time required for knowledge accumulation.

CONCLUSION

Neural network graph-embedding link prediction methods show promise for the literature-based discovery context, although the prediction setting is extremely challenging, with graph densities of less than 1%. Varying prediction window length on the time-sliced evaluation methodology leads to hugely different results and interpretations of LBD studies. Our approach can be generalized to enable knowledge discovery for other diseases.

AVAILABILITY

Code, AD ontology, and data are available at https://github.com/READ-BioMed/readbiomed-lbd.

Collapse

Lyons EL, Watson D, Alodadi MS, Haugabook SJ, Tawa GJ, Hannah-Shmouni F, Porter FD, Collins JR, Ottinger EA, Mudunuri US. Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus. BMC Genomics 2023;24:460. [PMID: 37587458 PMCID: PMC10433598 DOI: 10.1186/s12864-023-09561-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 08/08/2023] [Indexed: 08/18/2023] Open

Abstract

BACKGROUND

Approximately 4-8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain.

RESULTS

This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm.

CONCLUSIONS

Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases.

Collapse

Sun Z, Tao C. Named Entity Recognition and Normalization for Alzheimer's Disease Eligibility Criteria. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2023;2023:558-564. [PMID: 38283164 PMCID: PMC10815931 DOI: 10.1109/ichi57859.2023.00100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]

Abstract

Alzheimer's Disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Finding effective treatments for this disease is crucial. Clinical trials play an essential role in developing and testing new treatments for AD. However, identifying eligible participants can be challenging, time-consuming, and costly. In recent years, the development of natural language processing (NLP) techniques, specifically named entity recognition (NER) and named entity normalization (NEN), have helped to automate the identification and extraction of relevant information from the eligibility criteria (EC) more efficiently, in order to facilitate semi-automatic patient recruitment and enable data FAIRness for clinical trial data. Nevertheless, most current biomedical NER models only provide annotations for a restricted set of entity types that may not be applicable to the clinical trial data. Additionally, accurately performing NEN on entities that are negated using a negative prefix currently lacks established techniques. In this paper, we introduce a pipeline designed for information extraction from AD clinical trial EC, which involves preprocessing of the EC data, clinical NER, and biomedical NEN to Unified Medical Language System (UMLS). Our NER model can identify named entities in seven pre-defined categories, while our NEN model employs a combination of exact match and partial match search strategies, as well as customized rules to accurately normalize entities with negative prefixes. To evaluate the performance of our pipeline, we measured the precision, recall, and F1 score for the NER component, and we manually reviewed the top five mapping results produced by the NEN component. Our evaluation of the pipeline's performance revealed that it can successfully normalize named entities in clinical trial ECs with optimal accuracies. The NER component achieved a overall F1 of 0.816, demonstrating its ability to accurately identify seven types of named entities in clinical text. The NEN component of the pipeline also demonstrated impressive performance, with customized rules and a combination of exact and partial match strategies leading to an accuracy of 0.940 for normalized entities.

Collapse

Faessler E, Hahn U, Schäuble S. GePI: large-scale text mining, customized retrieval and flexible filtering of gene/protein interactions. Nucleic Acids Res 2023:7177881. [PMID: 37224532 DOI: 10.1093/nar/gkad445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 05/01/2023] [Accepted: 05/11/2023] [Indexed: 05/26/2023] Open

Nicholson DN, Alquaddoomi F, Rubinetti V, Greene CS. Changing word meanings in biomedical literature reveal pandemics and new technologies. BioData Min 2023;16:16. [PMID: 37147665 PMCID: PMC10161184 DOI: 10.1186/s13040-023-00332-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/24/2023] [Indexed: 05/07/2023] Open

Kroll H, Pirklbauer J, Kalo JC, Kunz M, Ruthmann J, Balke WT. A discovery system for narrative query graphs: entity-interaction-aware document retrieval. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES 2023:1-22. [PMID: 37361126 PMCID: PMC10123011 DOI: 10.1007/s00799-023-00356-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 01/19/2023] [Accepted: 03/16/2023] [Indexed: 06/28/2023]

Monteiro JP, Morine MJ, Ued FV, Kaput J. Identifying and Analyzing Topic Clusters in a Nutri-, Food-, and Diet-Proteomic Corpus Using Machine Reading. Nutrients 2023;15:nu15020270. [PMID: 36678141 PMCID: PMC9863309 DOI: 10.3390/nu15020270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/27/2022] [Accepted: 12/30/2022] [Indexed: 01/06/2023] Open

Saxena P, Rauniyar S, Thakur P, Singh RN, Bomgni A, Alaba MO, Tripathi AK, Gnimpieba EZ, Lushbough C, Sani RK. Integration of text mining and biological network analysis: Identification of essential genes in sulfate-reducing bacteria. Front Microbiol 2023;14:1086021. [PMID: 37125195 PMCID: PMC10133479 DOI: 10.3389/fmicb.2023.1086021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 03/23/2023] [Indexed: 05/02/2023] Open

Abstract

The growth and survival of an organism in a particular environment is highly depends on the certain indispensable genes, termed as essential genes. Sulfate-reducing bacteria (SRB) are obligate anaerobes which thrives on sulfate reduction for its energy requirements. The present study used Oleidesulfovibrio alaskensis G20 (OA G20) as a model SRB to categorize the essential genes based on their key metabolic pathways. Herein, we reported a feedback loop framework for gene of interest discovery, from bio-problem to gene set of interest, leveraging expert annotation with computational prediction. Defined bio-problem was applied to retrieve the genes of SRB from literature databases (PubMed, and PubMed Central) and annotated them to the genome of OA G20. Retrieved gene list was further used to enrich protein-protein interaction and was corroborated to the pangenome analysis, to categorize the enriched gene sets and the respective pathways under essential and non-essential. Interestingly, the sat gene (dde_2265) from the sulfur metabolism was the bridging gene between all the enriched pathways. Gene clusters involved in essential pathways were linked with the genes from seleno-compound metabolism, amino acid metabolism, secondary metabolite synthesis, and cofactor biosynthesis. Furthermore, pangenome analysis demonstrated the gene distribution, where 69.83% of the 116 enriched genes were mapped under "persistent," inferring the essentiality of these genes. Likewise, 21.55% of the enriched genes, which involves specially the formate dehydrogenases and metallic hydrogenases, appeared under "shell." Our methodology suggested that semi-automated text mining and network analysis may play a crucial role in deciphering the previously unexplored genes and key mechanisms which can help to generate a baseline prior to perform any experimental studies.

Collapse

Affiliation(s)

Priya Saxena Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD, United States
Shailabh Rauniyar Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States
Payal Thakur Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD, United States
Ram Nageena Singh Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States
Alain Bomgni Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States
Mathew O. Alaba Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States
Abhilash Kumar Tripathi Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States
Etienne Z. Gnimpieba Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States *Correspondence: Etienne Z. Gnimpieba,
Carol Lushbough Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States
Rajesh Kumar Sani Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD, United States 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States BuG ReMeDEE Consortium, South Dakota School of Mines and Technology, Rapid City, SD, United States Rajesh Kumar Sani,

Collapse

Ivanisenko TV, Demenkov PS, Kolchanov NA, Ivanisenko VA. The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition. Int J Mol Sci 2022;23:ijms232314934. [PMID: 36499269 PMCID: PMC9738852 DOI: 10.3390/ijms232314934] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 11/19/2022] [Accepted: 11/22/2022] [Indexed: 12/05/2022] Open

Zheng X, Du H, Luo X, Tong F, Song W, Zhao D. BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework. BMC Bioinformatics 2022;23:501. [PMID: 36418937 PMCID: PMC9682683 DOI: 10.1186/s12859-022-05051-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 11/10/2022] [Indexed: 11/24/2022] Open

Abstract

BACKGROUND

Automatic and accurate recognition of various biomedical named entities from literature is an important task of biomedical text mining, which is the foundation of extracting biomedical knowledge from unstructured texts into structured formats. Using the sequence labeling framework and deep neural networks to implement biomedical named entity recognition (BioNER) is a common method at present. However, the above method often underutilizes syntactic features such as dependencies and topology of sentences. Therefore, it is an urgent problem to be solved to integrate semantic and syntactic features into the BioNER model.

RESULTS

In this paper, we propose a novel biomedical named entity recognition model, named BioByGANS (BioBERT/SpaCy-Graph Attention Network-Softmax), which uses a graph to model the dependencies and topology of a sentence and formulate the BioNER task as a node classification problem. This formulation can introduce more topological features of language and no longer be only concerned about the distance between words in the sequence. First, we use periods to segment sentences and spaces and symbols to segment words. Second, contextual features are encoded by BioBERT, and syntactic features such as part of speeches, dependencies and topology are preprocessed by SpaCy respectively. A graph attention network is then used to generate a fusing representation considering both the contextual features and syntactic features. Last, a softmax function is used to calculate the probabilities and get the results. We conduct experiments on 8 benchmark datasets, and our proposed model outperforms existing BioNER state-of-the-art methods on the BC2GM, JNLPBA, BC4CHEMD, BC5CDR-chem, BC5CDR-disease, NCBI-disease, Species-800, and LINNAEUS datasets, and achieves F1-scores of 85.15%, 78.16%, 92.97%, 94.74%, 87.74%, 91.57%, 75.01%, 90.99%, respectively.

CONCLUSION

The experimental results on 8 biomedical benchmark datasets demonstrate the effectiveness of our model, and indicate that formulating the BioNER task into a node classification problem and combining syntactic features into the graph attention networks can significantly improve model performance.

Collapse

Nicholson DN, Himmelstein DS, Greene CS. Expanding a database-derived biomedical knowledge graph via multi-relation extraction from biomedical abstracts. BioData Min 2022;15:26. [PMID: 36258252 PMCID: PMC9578183 DOI: 10.1186/s13040-022-00311-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 09/17/2022] [Indexed: 02/04/2023] Open

Abstract

BACKGROUND

Knowledge graphs support biomedical research efforts by providing contextual information for biomedical entities, constructing networks, and supporting the interpretation of high-throughput analyses. These databases are populated via manual curation, which is challenging to scale with an exponentially rising publication rate. Data programming is a paradigm that circumvents this arduous manual process by combining databases with simple rules and heuristics written as label functions, which are programs designed to annotate textual data automatically. Unfortunately, writing a useful label function requires substantial error analysis and is a nontrivial task that takes multiple days per function. This bottleneck makes populating a knowledge graph with multiple nodes and edge types practically infeasible. Thus, we sought to accelerate the label function creation process by evaluating how label functions can be re-used across multiple edge types.

RESULTS

We obtained entity-tagged abstracts and subsetted these entities to only contain compounds, genes, and disease mentions. We extracted sentences containing co-mentions of certain biomedical entities contained in a previously described knowledge graph, Hetionet v1. We trained a baseline model that used database-only label functions and then used a sampling approach to measure how well adding edge-specific or edge-mismatch label function combinations improved over our baseline. Next, we trained a discriminator model to detect sentences that indicated a biomedical relationship and then estimated the number of edge types that could be recalled and added to Hetionet v1. We found that adding edge-mismatch label functions rarely improved relationship extraction, while control edge-specific label functions did. There were two exceptions to this trend, Compound-binds-Gene and Gene-interacts-Gene, which both indicated physical relationships and showed signs of transferability. Across the scenarios tested, discriminative model performance strongly depends on generated annotations. Using the best discriminative model for each edge type, we recalled close to 30% of established edges within Hetionet v1.

CONCLUSIONS

Our results show that this framework can incorporate novel edges into our source knowledge graph. However, results with label function transfer were mixed. Only label functions describing very similar edge types supported improved performance when transferred. We expect that the continued development of this strategy may provide essential building blocks to populating biomedical knowledge graphs with discoveries, ensuring that these resources include cutting-edge results.

Collapse

Luo L, Wei CH, Lai PT, Chen Q, Islamaj R, Lu Z. Assigning species information to corresponding genes by a sequence labeling framework. Database (Oxford) 2022;2022:6760187. [PMID: 36227127 PMCID: PMC9558450 DOI: 10.1093/database/baac090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 08/26/2022] [Accepted: 10/11/2022] [Indexed: 01/24/2023]

Luo L, Lai PT, Wei CH, Arighi CN, Lu Z. BioRED: a rich biomedical relation extraction dataset. Brief Bioinform 2022;23:6645993. [PMID: 35849818 PMCID: PMC9487702 DOI: 10.1093/bib/bbac282] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 06/02/2022] [Accepted: 06/19/2022] [Indexed: 11/13/2022] Open

Wei CH, Allot A, Riehle K, Milosavljevic A, Lu Z. tmVar 3.0: an improved variant concept recognition and normalization tool. Bioinformatics 2022;38:4449-4451. [PMID: 35904569 PMCID: PMC9477515 DOI: 10.1093/bioinformatics/btac537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 07/07/2022] [Accepted: 07/27/2022] [Indexed: 12/24/2022] Open

Xu Q, Liu Y, Hu J, Duan X, Song N, Zhou J, Zhai J, Su J, Liu S, Chen F, Zheng W, Guo Z, Li H, Zhou Q, Niu B. OncoPubMiner: a platform for mining oncology publications. Brief Bioinform 2022;23:6691792. [PMID: 36058206 DOI: 10.1093/bib/bbac383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/12/2022] Open

Affiliation(s)

Quan Xu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Yueyue Liu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Jifang Hu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100190, China
Xiaohong Duan ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Niuben Song ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Jiale Zhou ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Jincheng Zhai ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Junyan Su ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Siyao Liu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Fan Chen ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Wei Zheng The Department of Nephrology and Hypertension Medicine, Beijing Electric Power Hospital, Beijing 100073, China
Zhongjia Guo ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Hexiang Li ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Qiming Zhou ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Beifang Niu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100190, China

Collapse

Sung M, Jeong M, Choi Y, Kim D, Lee J, Kang J. BERN2: an advanced neural biomedical named entity recognition and normalization tool. Bioinformatics 2022;38:4837-4839. [PMID: 36053172 PMCID: PMC9563680 DOI: 10.1093/bioinformatics/btac598] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 07/09/2022] [Accepted: 08/31/2022] [Indexed: 11/14/2022] Open

Lin PC, Tsai YS, Yeh YM, Shen MR. Cutting-Edge AI Technologies Meet Precision Medicine to Improve Cancer Care. Biomolecules 2022;12:biom12081133. [PMID: 36009026 PMCID: PMC9405970 DOI: 10.3390/biom12081133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 08/11/2022] [Accepted: 08/15/2022] [Indexed: 11/18/2022] Open

Garda S, Lenihan-Geels F, Proft S, Hochmuth S, Schülke M, Seelow D, Leser U. RegEl corpus: identifying DNA regulatory elements in the scientific literature. Database (Oxford) 2022;2022:6618549. [PMID: 35758881 PMCID: PMC9235371 DOI: 10.1093/database/baac043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 05/25/2022] [Accepted: 06/02/2022] [Indexed: 11/17/2022]

Gyori BM, Hoyt CT, Steppi A. Gilda: biomedical entity text normalization with machine-learned disambiguation as a service. BIOINFORMATICS ADVANCES 2022;2:vbac034. [PMID: 36699362 PMCID: PMC9710686 DOI: 10.1093/bioadv/vbac034] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 04/27/2022] [Accepted: 05/06/2022] [Indexed: 01/28/2023]

Li PH, Chen TF, Yu JY, Shih SH, Su CH, Lin YH, Tsai HK, Juan HF, Chen CY, Huang JH. pubmedKB: an interactive web server for exploring biomedical entity relations in the biomedical literature. Nucleic Acids Res 2022;50:W616-W622. [PMID: 35536289 PMCID: PMC9252824 DOI: 10.1093/nar/gkac310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 04/06/2022] [Accepted: 04/18/2022] [Indexed: 11/15/2022] Open

Zhu X, Gu Y, Xiao Z. HerbKG: Constructing a Herbal-Molecular Medicine Knowledge Graph Using a Two-Stage Framework Based on Deep Transfer Learning. Front Genet 2022;13:799349. [PMID: 35571049 PMCID: PMC9091197 DOI: 10.3389/fgene.2022.799349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 04/05/2022] [Indexed: 11/13/2022] Open

Alshahrani M, Almansour A, Alkhaldi A, Thafar MA, Uludag M, Essack M, Hoehndorf R. Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications. PeerJ 2022;10:e13061. [PMID: 35402106 PMCID: PMC8988936 DOI: 10.7717/peerj.13061] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 02/13/2022] [Indexed: 01/11/2023] Open

Elangovan A, Li Y, Pires DEV, Davis MJ, Verspoor K. Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT. BMC Bioinformatics 2022;23:4. [PMID: 34983371 PMCID: PMC8729035 DOI: 10.1186/s12859-021-04504-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open

Abstract

MOTIVATION

Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation.

METHOD

We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions.

RESULTS AND CONCLUSION

The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.

Collapse

El Idrissi F, Fruchart M, Belarbi K, Lamer A, Dubois-Deruy E, Lemdani M, N’Guessan AL, Guinhouya BC, Zitouni D. Exploration of the core protein network under endometriosis symptomatology using a computational approach. Front Endocrinol (Lausanne) 2022;13:869053. [PMID: 36120440 PMCID: PMC9478376 DOI: 10.3389/fendo.2022.869053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 08/17/2022] [Indexed: 11/13/2022] Open

Chen HO, Lin PC, Liu CR, Wang CS, Chiang JH. Contextualizing Genes by Using Text-Mined Co-Occurrence Features for Cancer Gene Panel Discovery. Front Genet 2021;12:771435. [PMID: 34759963 PMCID: PMC8573063 DOI: 10.3389/fgene.2021.771435] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 10/11/2021] [Indexed: 12/13/2022] Open

Larmande P, Liu Y, Yao X, Xia J. OryzaGP 2021 update: a rice gene and protein dataset for named-entity recognition. Genomics Inform 2021;19:e27. [PMID: 34638174 PMCID: PMC8510865 DOI: 10.5808/gi.21015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 07/27/2021] [Indexed: 12/02/2022] Open

Ostaszewski M, Niarakis A, Mazein A, Kuperstein I, Phair R, Orta‐Resendiz A, Singh V, Aghamiri SS, Acencio ML, Glaab E, Ruepp A, Fobo G, Montrone C, Brauner B, Frishman G, Monraz Gómez LC, Somers J, Hoch M, Kumar Gupta S, Scheel J, Borlinghaus H, Czauderna T, Schreiber F, Montagud A, Ponce de Leon M, Funahashi A, Hiki Y, Hiroi N, Yamada TG, Dräger A, Renz A, Naveez M, Bocskei Z, Messina F, Börnigen D, Fergusson L, Conti M, Rameil M, Nakonecnij V, Vanhoefer J, Schmiester L, Wang M, Ackerman EE, Shoemaker JE, Zucker J, Oxford K, Teuton J, Kocakaya E, Summak GY, Hanspers K, Kutmon M, Coort S, Eijssen L, Ehrhart F, Rex DAB, Slenter D, Martens M, Pham N, Haw R, Jassal B, Matthews L, Orlic‐Milacic M, Senff Ribeiro A, Rothfels K, Shamovsky V, Stephan R, Sevilla C, Varusai T, Ravel J, Fraser R, Ortseifen V, Marchesi S, Gawron P, Smula E, Heirendt L, Satagopam V, Wu G, Riutta A, Golebiewski M, Owen S, Goble C, Hu X, Overall RW, Maier D, Bauch A, Gyori BM, Bachman JA, Vega C, Grouès V, Vazquez M, Porras P, Licata L, Iannuccelli M, Sacco F, Nesterova A, Yuryev A, de Waard A, Turei D, Luna A, Babur O, Soliman S, Valdeolivas A, Esteban‐Medina M, Peña‐Chilet M, Rian K, Helikar T, Puniya BL, Modos D, Treveil A, Olbei M, De Meulder B, Ballereau S, Dugourd A, Naldi A, Noël V, Calzone L, Sander C, Demir E, Korcsmaros T, Freeman TC, Augé F, Beckmann JS, Hasenauer J, Wolkenhauer O, Wilighagen EL, Pico AR, Evelo CT, Gillespie ME, Stein LD, Hermjakob H, D'Eustachio P, Saez‐Rodriguez J, Dopazo J, Valencia A, Kitano H, Barillot E, Auffray C, Balling R, Schneider R. COVID19 Disease Map, a computational knowledge repository of virus-host interaction mechanisms. Mol Syst Biol 2021;17:e10387. [PMID: 34664389 PMCID: PMC8524328 DOI: 10.15252/msb.202110387] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 08/25/2021] [Accepted: 08/26/2021] [Indexed: 12/13/2022] Open

Affiliation(s)

Marek Ostaszewski Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
Anna Niarakis Université Paris‐SaclayLaboratoire Européen de Recherche pour la Polyarthrite rhumatoïde ‐ GenhotelUniv EvryEvryFrance Lifeware GroupInria Saclay‐Ile de FrancePalaiseauFrance
Alexander Mazein Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
Inna Kuperstein Institut CuriePSL Research UniversityParisFrance INSERMParisFrance MINES ParisTechPSL Research UniversityParisFrance
Robert Phair Integrative Bioinformatics, Inc.Mountain ViewCAUSA
Aurelio Orta‐Resendiz Institut PasteurUniversité de Paris, Unité HIVInflammation et PersistanceParisFrance Bio Sorbonne Paris CitéUniversité de ParisParisFrance
Vidisha Singh Université Paris‐SaclayLaboratoire Européen de Recherche pour la Polyarthrite rhumatoïde ‐ GenhotelUniv EvryEvryFrance
Sara Sadat Aghamiri Inserm‐ Institut national de la santé et de la recherche médicaleParisFrance
Marcio Luis Acencio Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
Enrico Glaab Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
Andreas Ruepp Institute of Experimental Genetics (IEG)Helmholtz Zentrum München‐German Research Center for Environmental Health (GmbH)NeuherbergGermany
Gisela Fobo Institute of Experimental Genetics (IEG)Helmholtz Zentrum München‐German Research Center for Environmental Health (GmbH)NeuherbergGermany
Corinna Montrone Institute of Experimental Genetics (IEG)Helmholtz Zentrum München‐German Research Center for Environmental Health (GmbH)NeuherbergGermany
Barbara Brauner Institute of Experimental Genetics (IEG)Helmholtz Zentrum München‐German Research Center for Environmental Health (GmbH)NeuherbergGermany
Goar Frishman Institute of Experimental Genetics (IEG)Helmholtz Zentrum München‐German Research Center for Environmental Health (GmbH)NeuherbergGermany
Luis Cristóbal Monraz Gómez Institut CuriePSL Research UniversityParisFrance INSERMParisFrance MINES ParisTechPSL Research UniversityParisFrance
Julia Somers Department of Molecular and Medical GeneticsOregon Health & Sciences UniversityPortlandORUSA
Matti Hoch Department of Systems Biology and BioinformaticsUniversity of RostockRostockGermany
Shailendra Kumar Gupta Department of Systems Biology and BioinformaticsUniversity of RostockRostockGermany
Julia Scheel Department of Systems Biology and BioinformaticsUniversity of RostockRostockGermany
Hanna Borlinghaus Department of Computer and Information ScienceUniversity of KonstanzKonstanzGermany
Tobias Czauderna Faculty of Information TechnologyDepartment of Human‐Centred ComputingMonash UniversityClaytonVic.Australia
Falk Schreiber Department of Computer and Information ScienceUniversity of KonstanzKonstanzGermany Faculty of Information TechnologyDepartment of Human‐Centred ComputingMonash UniversityClaytonVic.Australia
Arnau Montagud Barcelona Supercomputing Center (BSC)BarcelonaSpain
Miguel Ponce de Leon Barcelona Supercomputing Center (BSC)BarcelonaSpain
Akira Funahashi Department of Biosciences and InformaticsKeio UniversityYokohamaJapan
Yusuke Hiki Department of Biosciences and InformaticsKeio UniversityYokohamaJapan
Noriko Hiroi Graduate School of Media and GovernanceResearch Institute at SFCKeio UniversityKanagawaJapan
Takahiro G Yamada Department of Biosciences and InformaticsKeio UniversityYokohamaJapan
Andreas Dräger Computational Systems Biology of Infections and Antimicrobial‐Resistant PathogensInstitute for Bioinformatics and Medical Informatics (IBMI)University of TübingenTübingenGermany Department of Computer ScienceUniversity of TübingenTübingenGermany German Center for Infection Research (DZIF), partner siteTübingenGermany
Alina Renz Computational Systems Biology of Infections and Antimicrobial‐Resistant PathogensInstitute for Bioinformatics and Medical Informatics (IBMI)University of TübingenTübingenGermany Department of Computer ScienceUniversity of TübingenTübingenGermany
Muhammad Naveez Department of Systems Biology and BioinformaticsUniversity of RostockRostockGermany Institute of Applied Computer SystemsRiga Technical UniversityRigaLatvia
Zsolt Bocskei Sanofi R&DTranslational SciencesChilly‐MazarinFrance
Francesco Messina Dipartimento di Epidemiologia Ricerca Pre‐Clinica e Diagnostica AvanzataNational Institute for Infectious Diseases 'Lazzaro Spallanzani' I.R.C.C.S.RomeItaly COVID‐19 INMI Network Medicine for IDs Study GroupNational Institute for Infectious Diseases 'Lazzaro Spallanzani' I.R.C.C.SRomeItaly
Daniela Börnigen Bioinformatics Core FacilityUniversitätsklinikum Hamburg‐EppendorfHamburgGermany
Liam Fergusson Royal (Dick) School of Veterinary MedicineThe University of EdinburghEdinburghUK
Marta Conti Faculty of Mathematics and Natural SciencesUniversity of BonnBonnGermany
Marius Rameil Faculty of Mathematics and Natural SciencesUniversity of BonnBonnGermany
Vanessa Nakonecnij Faculty of Mathematics and Natural SciencesUniversity of BonnBonnGermany
Jakob Vanhoefer Faculty of Mathematics and Natural SciencesUniversity of BonnBonnGermany
Leonard Schmiester Faculty of Mathematics and Natural SciencesUniversity of BonnBonnGermany Center for MathematicsChair of Mathematical Modeling of Biological SystemsTechnische Universität MünchenGarchingGermany
Muying Wang Department of Chemical and Petroleum EngineeringUniversity of PittsburghPittsburghPAUSA
Emily E Ackerman Department of Chemical and Petroleum EngineeringUniversity of PittsburghPittsburghPAUSA
Jason E Shoemaker Department of Chemical and Petroleum EngineeringUniversity of PittsburghPittsburghPAUSA Department of Computational and Systems BiologyUniversity of PittsburghPittsburghPAUSA
Jeremy Zucker Pacific Northwest National LaboratoryRichlandWAUSA
Kristie Oxford Pacific Northwest National LaboratoryRichlandWAUSA
Jeremy Teuton Pacific Northwest National LaboratoryRichlandWAUSA
Ebru Kocakaya Stem Cell InstituteAnkara UniversityAnkaraTurkey
Gökçe Yağmur Summak Stem Cell InstituteAnkara UniversityAnkaraTurkey
Kristina Hanspers Institute of Data Science and BiotechnologyGladstone InstitutesSan FranciscoCAUSA
Martina Kutmon Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands Maastricht Centre for Systems Biology (MaCSBio)Maastricht UniversityMaastrichtThe Netherlands
Susan Coort Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
Lars Eijssen Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands Maastricht University Medical CentreMaastrichtThe Netherlands
Friederike Ehrhart Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands Maastricht University Medical CentreMaastrichtThe Netherlands
Devasahayam Arokia Balaya Rex Center for Systems Biology and Molecular MedicineYenepoya (Deemed to be University)MangaloreIndia
Denise Slenter Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
Marvin Martens Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
Nhung Pham Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
Robin Haw MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
Bijay Jassal MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
Lisa Matthews NYU Grossman School of MedicineNew YorkNYUSA
Marija Orlic‐Milacic MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
Andrea Senff Ribeiro MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada Universidade Federal do ParanáCuritibaBrasil
Karen Rothfels MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
Veronica Shamovsky NYU Grossman School of MedicineNew YorkNYUSA
Ralf Stephan MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada
Cristoffer Sevilla European Bioinformatics Institute (EMBL‐EBI)European Molecular Biology LaboratoryHinxton, CambridgeshireUK
Thawfeek Varusai European Bioinformatics Institute (EMBL‐EBI)European Molecular Biology LaboratoryHinxton, CambridgeshireUK
Jean‐Marie Ravel INSERM UMR_S 1256Nutrition, Genetics, and Environmental Risk Exposure (NGERE)Faculty of Medicine of NancyUniversity of LorraineNancyFrance Laboratoire de génétique médicaleCHRU NancyNancyFrance
Rupsha Fraser Queen's Medical Research InstituteThe University of EdinburghEdinburghUK
Vera Ortseifen Senior Research Group in Genome Research of Industrial MicroorganismsCenter for BiotechnologyBielefeld UniversityBielefeldGermany
Silvia Marchesi Department of Surgical ScienceUppsala UniversityUppsalaSweden
Piotr Gawron Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg Institute of Computing SciencePoznan University of TechnologyPoznanPoland
Ewa Smula Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
Laurent Heirendt Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
Venkata Satagopam Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
Guanming Wu Department of Medical Informatics and Clinical EpidemiologyOregon Health & Science UniversityPortlandORUSA
Anders Riutta Institute of Data Science and BiotechnologyGladstone InstitutesSan FranciscoCAUSA
Martin Golebiewski Heidelberg Institute for Theoretical Studies (HITS)HeidelbergGermany
Stuart Owen Department of Computer ScienceThe University of ManchesterManchesterUK
Carole Goble Department of Computer ScienceThe University of ManchesterManchesterUK
Xiaoming Hu Heidelberg Institute for Theoretical Studies (HITS)HeidelbergGermany
Rupert W Overall German Center for Neurodegenerative Diseases (DZNE) DresdenDresdenGermany Center for Regenerative Therapies Dresden (CRTD)Technische Universität DresdenDresdenGermany Institute for BiologyHumboldt University of BerlinBerlinGermany
Dieter Maier Biomax Informatics AGPlaneggGermany
Angela Bauch Biomax Informatics AGPlaneggGermany
Benjamin M Gyori Harvard Medical SchoolLaboratory of Systems PharmacologyBostonMAUSA
John A Bachman Harvard Medical SchoolLaboratory of Systems PharmacologyBostonMAUSA
Carlos Vega Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
Valentin Grouès Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
Miguel Vazquez Barcelona Supercomputing Center (BSC)BarcelonaSpain
Pablo Porras European Bioinformatics Institute (EMBL‐EBI)European Molecular Biology LaboratoryHinxton, CambridgeshireUK
Luana Licata Department of BiologyUniversity of Rome Tor VergataRomeItaly
Marta Iannuccelli Department of BiologyUniversity of Rome Tor VergataRomeItaly
Francesca Sacco Department of BiologyUniversity of Rome Tor VergataRomeItaly
Anastasia Nesterova ElsevierPhiladelphiaPAUSA
Anton Yuryev ElsevierPhiladelphiaPAUSA
Anita de Waard Research Collaborations UnitElsevierJerichoVTUSA
Denes Turei Institute for Computational BiomedicineHeidelberg UniversityHeidelbergGermany
Augustin Luna cBio Center, Divisions of Biostatistics and Computational BiologyDepartment of Data SciencesDana‐Farber Cancer InstituteBostonMAUSA Department of Cell BiologyHarvard Medical SchoolBostonMAUSA
Ozgun Babur Computer Science DepartmentUniversity of Massachusetts BostonBostonMAUSA
Sylvain Soliman Lifeware GroupInria Saclay‐Ile de FrancePalaiseauFrance
Alberto Valdeolivas Institute for Computational BiomedicineHeidelberg UniversityHeidelbergGermany
Marina Esteban‐Medina Clinical Bioinformatics AreaFundación Progreso y Salud (FPS)Hospital Virgen del RocioSevillaSpain Computational Systems Medicine GroupInstitute of Biomedicine of Seville (IBIS)Hospital Virgen del RocioSevillaSpain
Maria Peña‐Chilet Clinical Bioinformatics AreaFundación Progreso y Salud (FPS)Hospital Virgen del RocioSevillaSpain Computational Systems Medicine GroupInstitute of Biomedicine of Seville (IBIS)Hospital Virgen del RocioSevillaSpain Bioinformatics in Rare Diseases (BiER)Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)FPS, Hospital Virgen del RocíoSevillaSpain
Kinza Rian Clinical Bioinformatics AreaFundación Progreso y Salud (FPS)Hospital Virgen del RocioSevillaSpain Computational Systems Medicine GroupInstitute of Biomedicine of Seville (IBIS)Hospital Virgen del RocioSevillaSpain
Tomáš Helikar Department of BiochemistryUniversity of Nebraska‐LincolnLincolnNEUSA
Bhanwar Lal Puniya Department of BiochemistryUniversity of Nebraska‐LincolnLincolnNEUSA
Dezso Modos Quadram Institute BioscienceNorwichUK Earlham InstituteNorwichUK
Agatha Treveil Quadram Institute BioscienceNorwichUK Earlham InstituteNorwichUK
Marton Olbei Quadram Institute BioscienceNorwichUK Earlham InstituteNorwichUK
Bertrand De Meulder European Institute for Systems Biology and Medicine (EISBM)VourlesFrance
Stephane Ballereau Cancer Research UK Cambridge InstituteUniversity of CambridgeCambridgeUK
Aurélien Dugourd Institute for Computational BiomedicineHeidelberg UniversityHeidelbergGermany Institute of Experimental Medicine and Systems BiologyFaculty of Medicine, RWTHAachen UniversityAachenGermany
Aurélien Naldi Lifeware GroupInria Saclay‐Ile de FrancePalaiseauFrance
Vincent Noël Institut CuriePSL Research UniversityParisFrance INSERMParisFrance MINES ParisTechPSL Research UniversityParisFrance
Laurence Calzone Institut CuriePSL Research UniversityParisFrance INSERMParisFrance MINES ParisTechPSL Research UniversityParisFrance
Chris Sander cBio Center, Divisions of Biostatistics and Computational BiologyDepartment of Data SciencesDana‐Farber Cancer InstituteBostonMAUSA Department of Cell BiologyHarvard Medical SchoolBostonMAUSA
Emek Demir Department of Molecular and Medical GeneticsOregon Health & Sciences UniversityPortlandORUSA
Tamas Korcsmaros Quadram Institute BioscienceNorwichUK Earlham InstituteNorwichUK
Tom C Freeman The Roslin InstituteUniversity of EdinburghEdinburghUK
Franck Augé Sanofi R&DTranslational SciencesChilly‐MazarinFrance
Jacques S Beckmann University of LausanneLausanneSwitzerland
Jan Hasenauer Helmholtz Zentrum München – German Research Center for Environmental HealthInstitute of Computational BiologyNeuherbergGermany Interdisciplinary Research Unit Mathematics and Life SciencesUniversity of BonnBonnGermany
Olaf Wolkenhauer Department of Systems Biology and BioinformaticsUniversity of RostockRostockGermany
Egon L Wilighagen Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands
Alexander R Pico Institute of Data Science and BiotechnologyGladstone InstitutesSan FranciscoCAUSA
Chris T Evelo Department of Bioinformatics ‐ BiGCaTNUTRIMMaastricht UniversityMaastrichtThe Netherlands Maastricht Centre for Systems Biology (MaCSBio)Maastricht UniversityMaastrichtThe Netherlands
Marc E Gillespie MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada St. John’s University College of Pharmacy and Health SciencesQueensNYUSA
Lincoln D Stein MaRS CentreOntario Institute for Cancer ResearchTorontoONCanada Department of Molecular GeneticsUniversity of TorontoTorontoONCanada
Henning Hermjakob European Bioinformatics Institute (EMBL‐EBI)European Molecular Biology LaboratoryHinxton, CambridgeshireUK
Peter D'Eustachio NYU Grossman School of MedicineNew YorkNYUSA
Julio Saez‐Rodriguez Institute for Computational BiomedicineHeidelberg UniversityHeidelbergGermany
Joaquin Dopazo Clinical Bioinformatics AreaFundación Progreso y Salud (FPS)Hospital Virgen del RocioSevillaSpain Computational Systems Medicine GroupInstitute of Biomedicine of Seville (IBIS)Hospital Virgen del RocioSevillaSpain Bioinformatics in Rare Diseases (BiER)Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)FPS, Hospital Virgen del RocíoSevillaSpain FPS/ELIXIR‐esHospital Virgen del RocíoSevillaSpain
Alfonso Valencia Barcelona Supercomputing Center (BSC)BarcelonaSpain Institució Catalana de Recerca i Estudis Avançats (ICREA)BarcelonaSpain
Hiroaki Kitano Systems Biology InstituteTokyoJapan Okinawa Institute of Science and Technology Graduate SchoolOkinawaJapan
Emmanuel Barillot Institut CuriePSL Research UniversityParisFrance INSERMParisFrance MINES ParisTechPSL Research UniversityParisFrance
Charles Auffray Cancer Research UK Cambridge InstituteUniversity of CambridgeCambridgeUK
Rudi Balling Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
Reinhard Schneider Luxembourg Centre for Systems BiomedicineUniversity of LuxembourgEsch‐sur‐AlzetteLuxembourg
the COVID‐19 Disease Map Community

Collapse

Parolo S, Tomasoni D, Bora P, Ramponi A, Kaddi C, Azer K, Domenici E, Neves-Zaph S, Lombardo R. Reconstruction of the Cytokine Signaling in Lysosomal Storage Diseases by Literature Mining and Network Analysis. Front Cell Dev Biol 2021;9:703489. [PMID: 34490253 PMCID: PMC8417786 DOI: 10.3389/fcell.2021.703489] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/30/2021] [Indexed: 11/13/2022] Open

Yang X, Wu C, Nenadic G, Wang W, Lu K. Mining a stroke knowledge graph from literature. BMC Bioinformatics 2021;22:387. [PMID: 34325669 PMCID: PMC8319697 DOI: 10.1186/s12859-021-04292-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 07/06/2021] [Indexed: 01/01/2023] Open

Schmidt CO, Fluck J, Golebiewski M, Grabenhenrich L, Hahn H, Kirsten T, Klammt S, Löbe M, Sax U, Thun S, Pigeot I. [Making COVID-19 research data more accessible-building a nationwide information infrastructure]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2021;64:1084-1092. [PMID: 34297162 PMCID: PMC8298983 DOI: 10.1007/s00103-021-03386-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 06/28/2021] [Indexed: 11/24/2022]

Bauer C, Herwig R, Lienhard M, Prasse P, Scheffer T, Schuchhardt J. Large-scale literature mining to assess the relation between anti-cancer drugs and cancer types. J Transl Med 2021;19:274. [PMID: 34174885 PMCID: PMC8236166 DOI: 10.1186/s12967-021-02941-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 06/13/2021] [Indexed: 12/09/2022] Open

Abstract

Background

There is a huge body of scientific literature describing the relation between tumor types and anti-cancer drugs. The vast amount of scientific literature makes it impossible for researchers and physicians to extract all relevant information manually.

Methods

In order to cope with the large amount of literature we applied an automated text mining approach to assess the relations between 30 most frequent cancer types and 270 anti-cancer drugs. We applied two different approaches, a classical text mining based on named entity recognition and an AI-based approach employing word embeddings. The consistency of literature mining results was validated with 3 independent methods: first, using data from FDA approvals, second, using experimentally measured IC-50 cell line data and third, using clinical patient survival data.

Results

We demonstrated that the automated text mining was able to successfully assess the relation between cancer types and anti-cancer drugs. All validation methods showed a good correspondence between the results from literature mining and independent confirmatory approaches. The relation between most frequent cancer types and drugs employed for their treatment were visualized in a large heatmap. All results are accessible in an interactive web-based knowledge base using the following link: https://knowledgebase.microdiscovery.de/heatmap.

Conclusions

Our approach is able to assess the relations between compounds and cancer types in an automated manner. Both, cancer types and compounds could be grouped into different clusters. Researchers can use the interactive knowledge base to inspect the presented results and follow their own research questions, for example the identification of novel indication areas for known drugs.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12967-021-02941-z.

Collapse

Birgmeier J, Haeussler M, Deisseroth CA, Steinberg EH, Jagadeesh KA, Ratner AJ, Guturu H, Wenger AM, Diekhans ME, Stenson PD, Cooper DN, Ré C, Beggs AH, Bernstein JA, Bejerano G. AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature. Sci Transl Med 2021;12:12/544/eaau9113. [PMID: 32434849 DOI: 10.1126/scitranslmed.aau9113] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 08/14/2019] [Accepted: 04/22/2020] [Indexed: 12/21/2022]

Islamaj R, Wei CH, Cissel D, Miliaras N, Printseva O, Rodionov O, Sekiya K, Ward J, Lu Z. NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition. J Biomed Inform 2021;118:103779. [PMID: 33839304 PMCID: PMC11037554 DOI: 10.1016/j.jbi.2021.103779] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 03/14/2021] [Accepted: 04/05/2021] [Indexed: 10/21/2022]

Lee K, Wei CH, Lu Z. Recent advances of automated methods for searching and extracting genomic variant information from biomedical literature. Brief Bioinform 2021;22:bbaa142. [PMID: 32770181 PMCID: PMC8138883 DOI: 10.1093/bib/bbaa142] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 06/07/2020] [Accepted: 06/25/2020] [Indexed: 12/28/2022] Open

Azer K, Kaddi CD, Barrett JS, Bai JPF, McQuade ST, Merrill NJ, Piccoli B, Neves-Zaph S, Marchetti L, Lombardo R, Parolo S, Immanuel SRC, Baliga NS. History and Future Perspectives on the Discipline of Quantitative Systems Pharmacology Modeling and Its Applications. Front Physiol 2021;12:637999. [PMID: 33841175 PMCID: PMC8027332 DOI: 10.3389/fphys.2021.637999] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 01/25/2021] [Indexed: 12/24/2022] Open

Rahman P, Nandi A, Hebert C. Amplifying Domain Expertise in Clinical Data Pipelines. JMIR Med Inform 2020;8:e19612. [PMID: 33151150 PMCID: PMC7677017 DOI: 10.2196/19612] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 07/07/2020] [Accepted: 07/22/2020] [Indexed: 11/28/2022] Open

Perera N, Dehmer M, Emmert-Streib F. Named Entity Recognition and Relation Detection for Biomedical Information Extraction. Front Cell Dev Biol 2020;8:673. [PMID: 32984300 PMCID: PMC7485218 DOI: 10.3389/fcell.2020.00673] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 07/02/2020] [Indexed: 12/29/2022] Open

Saberian N, Shafi A, Peyvandipour A, Draghici S. MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature. Sci Rep 2020;10:12365. [PMID: 32703994 PMCID: PMC7378213 DOI: 10.1038/s41598-020-68649-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 06/17/2020] [Indexed: 11/09/2022] Open

Wei CH, Allot A, Leaman R, Lu Z. PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res 2020;47:W587-W593. [PMID: 31114887 DOI: 10.1093/nar/gkz389] [Citation(s) in RCA: 175] [Impact Index Per Article: 43.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2019] [Revised: 04/08/2019] [Accepted: 04/30/2019] [Indexed: 11/12/2022] Open

Kilicoglu H, Rosemblat G, Fiszman M, Shin D. Broad-coverage biomedical relation extraction with SemRep. BMC Bioinformatics 2020;21:188. [PMID: 32410573 PMCID: PMC7222583 DOI: 10.1186/s12859-020-3517-7] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 04/29/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships.

RESULTS

A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F 1 score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F 1 score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F 1 score. The recall and the F 1 score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level.

CONCLUSIONS

SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.

Collapse