1
|
Zheng H, Xu L, Xie H, Xie J, Ma Y, Hu Y, Wu L, Chen J, Wang M, Yi Y, Huang Y, Wang D. RIscoper 2.0: A deep learning tool to extract RNA biomedical relation sentences from literature. Comput Struct Biotechnol J 2024; 23:1469-1476. [PMID: 38623560 PMCID: PMC11016866 DOI: 10.1016/j.csbj.2024.03.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/15/2024] [Accepted: 03/21/2024] [Indexed: 04/17/2024] Open
Abstract
RNA plays an extensive role in a multi-dimensional regulatory system, and its biomedical relationships are scattered across numerous biological studies. However, text mining works dedicated to the extraction of RNA biomedical relations remain limited. In this study, we established a comprehensive and reliable corpus of RNA biomedical relations, recruiting over 30,000 sentences manually curated from more than 15,000 biomedical literature. We also updated RIscoper 2.0, a BERT-based deep learning tool to extract RNA biomedical relation sentences from literature. Benefiting from approximately 100,000 annotated named entities, we integrated the text classification and named entity recognition tasks in this tool. Additionally, RIscoper 2.0 outperformed the original tool in both tasks and can discover new RNA biomedical relations. Additionally, we provided a user-friendly online search tool that enables rapid scanning of RNA biomedical relationships using local and online resources. Both the online tools and data resources of RIscoper 2.0 are available at http://www.rnainter.org/riscoper.
Collapse
Affiliation(s)
- Hailong Zheng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, 510515 Guangzhou, China
| | - Linfu Xu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, 510515 Guangzhou, China
| | - Hailong Xie
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, 510515 Guangzhou, China
| | - Jiajing Xie
- National Institute for Data Science in Health and Medicine, Xiamen University, 361102 Xiamen, China
| | - Yapeng Ma
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, 510515 Guangzhou, China
| | - Yongfei Hu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, 510515 Guangzhou, China
| | - Le Wu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, 510515 Guangzhou, China
| | - Jia Chen
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, 510515 Guangzhou, China
| | - Meiyi Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, 510515 Guangzhou, China
| | - Ying Yi
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, 510515 Guangzhou, China
| | - Yan Huang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, 510515 Guangzhou, China
| | - Dong Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, 510515 Guangzhou, China
- Guangdong Province Key Laboratory of Molecular Tumor Pathology, 510515, Guangzhou, China
| |
Collapse
|
2
|
Lyons EL, Watson D, Alodadi MS, Haugabook SJ, Tawa GJ, Hannah-Shmouni F, Porter FD, Collins JR, Ottinger EA, Mudunuri US. Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus. BMC Genomics 2023; 24:460. [PMID: 37587458 PMCID: PMC10433598 DOI: 10.1186/s12864-023-09561-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 08/08/2023] [Indexed: 08/18/2023] Open
Abstract
BACKGROUND Approximately 4-8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. RESULTS This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. CONCLUSIONS Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases.
Collapse
Affiliation(s)
- Erica L Lyons
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Daniel Watson
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Mohammad S Alodadi
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Sharie J Haugabook
- Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Gregory J Tawa
- Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Fady Hannah-Shmouni
- Division of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Forbes D Porter
- Division of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Jack R Collins
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Elizabeth A Ottinger
- Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA.
| | - Uma S Mudunuri
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA.
| |
Collapse
|
3
|
Lee H, Jeon J, Jung D, Won JI, Kim K, Kim YJ, Yoon J. RelCurator: a text mining-based curation system for extracting gene-phenotype relationships specific to neurodegenerative disorders. Genes Genomics 2023:10.1007/s13258-023-01405-6. [PMID: 37300788 DOI: 10.1007/s13258-023-01405-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 05/18/2023] [Indexed: 06/12/2023]
Abstract
BACKGROUND The identification of gene-phenotype relationships is important in medical genetics as it serves as a basis for precision medicine. However, most of the gene-phenotype relationship data are buried in the biomedical literature in textual form. OBJECTIVE We propose RelCurator, a curation system that extracts sentences including both gene and phenotype entities related to specific disease categories from PubMed articles, provides rich additional information such as entity taggings, and predictions of gene-phenotype relationships. METHODS We targeted neurodegenerative disorders and developed a deep learning model using Bidirectional Gated Recurrent Unit (BiGRU) networks and BioWordVec word embeddings for predicting gene-phenotype relationships from biomedical texts. The prediction model is trained with more than 130,000 labeled PubMed sentences including gene and phenotype entities, which are related to or unrelated to neurodegenerative disorders. RESULTS We compared the performance of our deep learning model with those of Bidirectional Encoder Representations from Transformers (BERT), Support Vector Machine (SVM), and simple Recurrent Neural Network (simple RNN) models. Our model performed better with an F1-score of 0.96. Furthermore, the evaluation done using a few curation cases in the real scenario showed the effectiveness of our work. Therefore, we conclude that RelCurator can identify not only new causative genes, but also new genes associated with neurodegenerative disorders' phenotype. CONCLUSION RelCurator is a user-friendly method for accessing deep learning-based supporting information and a concise web interface to assist curators while browsing the PubMed articles. Our curation process represents an important and broadly applicable improvement to the state of the art for the curation of gene-phenotype relationships.
Collapse
Affiliation(s)
- Heonwoo Lee
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea
| | - Junbeom Jeon
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea
| | - Dawoon Jung
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea
| | - Jung-Im Won
- Center for Innovation in Engineering Education, Hanyang University, Seoul, Republic of Korea
| | - Kiyong Kim
- Department of Electronic Engineering, Kyonggi University, Suwon, Republic of Korea
| | - Yun Joong Kim
- Department of Neurology, Yonsei University College of Medicine, Seoul, Republic of Korea.
- Department of Neurology, Yongin Severance Hospital, Yonsei University College of Medicine, Yonsei University Health System, Yongin, Gyeonggi-do, 16995, Republic of Korea.
| | - Jeehee Yoon
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea.
| |
Collapse
|
4
|
Bokharaeian B, Dehghani M, Diaz A. Automatic extraction of ranked SNP-phenotype associations from text using a BERT-LSTM-based method. BMC Bioinformatics 2023; 24:144. [PMID: 37046202 PMCID: PMC10099837 DOI: 10.1186/s12859-023-05236-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 03/17/2023] [Indexed: 04/14/2023] Open
Abstract
Extraction of associations of singular nucleotide polymorphism (SNP) and phenotypes from biomedical literature is a vital task in BioNLP. Recently, some methods have been developed to extract mutation-diseases affiliations. However, no accessible method of extracting associations of SNP-phenotype from content considers their degree of certainty. In this paper, several machine learning methods were developed to extract ranked SNP-phenotype associations from biomedical abstracts and then were compared to each other. In addition, shallow machine learning methods, including random forest, logistic regression, and decision tree and two kernel-based methods like subtree and local context, a rule-based and a deep CNN-LSTM-based and two BERT-based methods were developed in this study to extract associations. Furthermore, the experiments indicated that although the used linguist features could be employed to implement a superior association extraction method outperforming the kernel-based counterparts, the used deep learning and BERT-based methods exhibited the best performance. However, the used PubMedBERT-LSTM outperformed the other developed methods among the used methods. Moreover, similar experiments were conducted to estimate the degree of certainty of the extracted association, which can be used to assess the strength of the reported association. The experiments revealed that our proposed PubMedBERT-CNN-LSTM method outperformed the sophisticated methods on the task.
Collapse
Affiliation(s)
| | - Mohammad Dehghani
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
| | - Alberto Diaz
- Facultad Informatica, Complutense University of Madrid, Madrid, Spain
| |
Collapse
|
5
|
Dlamini Z, Skepu A, Kim N, Mkhabele M, Khanyile R, Molefi T, Mbatha S, Setlai B, Mulaudzi T, Mabongo M, Bida M, Kgoebane-Maseko M, Mathabe K, Lockhat Z, Kgokolo M, Chauke-Malinga N, Ramagaga S, Hull R. AI and precision oncology in clinical cancer genomics: From prevention to targeted cancer therapies-an outcomes based patient care. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100965] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
|
6
|
Zeng J, Shufean MA. Molecular-based precision oncology clinical decision making augmented by artificial intelligence. Emerg Top Life Sci 2021; 5:757-764. [PMID: 34874054 PMCID: PMC8786281 DOI: 10.1042/etls20210220] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 11/08/2021] [Accepted: 11/16/2021] [Indexed: 01/03/2023]
Abstract
The rapid growth and decreasing cost of Next-generation sequencing (NGS) technologies have made it possible to conduct routine large panel genomic sequencing in many disease settings, especially in the oncology domain. Furthermore, it is now known that optimal disease management of patients depends on individualized cancer treatment guided by comprehensive molecular testing. However, translating results from molecular sequencing reports into actionable clinical insights remains a challenge to most clinicians. In this review, we discuss about some representative systems that leverage artificial intelligence (AI) to facilitate some processes of clinicians' decision making based upon molecular data, focusing on their application in precision oncology. Some limitations and pitfalls of the current application of AI in clinical decision making are also discussed.
Collapse
Affiliation(s)
- Jia Zeng
- Sheikh Khalifa Bin Zayed Al Nahyan Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, U.S.A
| | - Md Abu Shufean
- Sheikh Khalifa Bin Zayed Al Nahyan Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, U.S.A
| |
Collapse
|
7
|
A network representation approach for COVID-19 drug recommendation. Methods 2021; 198:3-10. [PMID: 34562584 PMCID: PMC8458160 DOI: 10.1016/j.ymeth.2021.09.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 08/30/2021] [Accepted: 09/19/2021] [Indexed: 12/15/2022] Open
Abstract
The coronavirus disease 2019 (COVID-19) has outbreak since early December 2019, and COVID-19 has caused over 100 million cases and 2 million deaths around the world. After one year of the COVID-19 outbreak, there is no certain and approve medicine against it. Drug repositioning has become one line of scientific research that is being pursued to develop an effective drug. However, due to the lack of COVID-19 data, there is still no specific drug repositioning targeting the COVID-19. In this paper, we propose a framework for COVID-19 drug repositioning. This framework has several advantages that can be exploited: one is that a local graph aggregating representation is used across a heterogeneous network to address the data sparsity problem; another is the multi-hop neighbors of the heterogeneous graph are aggregated to recall as many COVID-19 potential drugs as possible. Our experimental results show that our COVDR framework performs significantly better than baseline methods, and the docking simulation verifies that our three potential drugs have the ability to against COVID-19 disease.
Collapse
|
8
|
Lee K, Lockhart JH, Xie M, Chaudhary R, Slebos RJC, Flores ER, Chung CH, Tan AC. Deep Learning of Histopathology Images at the Single Cell Level. Front Artif Intell 2021; 4:754641. [PMID: 34568816 PMCID: PMC8461055 DOI: 10.3389/frai.2021.754641] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 08/27/2021] [Indexed: 12/12/2022] Open
Abstract
The tumor immune microenvironment (TIME) encompasses many heterogeneous cell types that engage in extensive crosstalk among the cancer, immune, and stromal components. The spatial organization of these different cell types in TIME could be used as biomarkers for predicting drug responses, prognosis and metastasis. Recently, deep learning approaches have been widely used for digital histopathology images for cancer diagnoses and prognoses. Furthermore, some recent approaches have attempted to integrate spatial and molecular omics data to better characterize the TIME. In this review we focus on machine learning-based digital histopathology image analysis methods for characterizing tumor ecosystem. In this review, we will consider three different scales of histopathological analyses that machine learning can operate within: whole slide image (WSI)-level, region of interest (ROI)-level, and cell-level. We will systematically review the various machine learning methods in these three scales with a focus on cell-level analysis. We will provide a perspective of workflow on generating cell-level training data sets using immunohistochemistry markers to "weakly-label" the cell types. We will describe some common steps in the workflow of preparing the data, as well as some limitations of this approach. Finally, we will discuss future opportunities of integrating molecular omics data with digital histopathology images for characterizing tumor ecosystem.
Collapse
Affiliation(s)
- Kyubum Lee
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - John H. Lockhart
- Department of Molecular Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Mengyu Xie
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Ritu Chaudhary
- Department of Head and Neck-Endocrine Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Robbert J. C. Slebos
- Department of Head and Neck-Endocrine Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Elsa R. Flores
- Department of Molecular Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
- Cancer Biology and Evolution Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Christine H. Chung
- Department of Head and Neck-Endocrine Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
- Molecular Medicine Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Aik Choon Tan
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
- Molecular Medicine Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| |
Collapse
|
9
|
Monaco A, Pantaleo E, Amoroso N, Lacalamita A, Lo Giudice C, Fonzino A, Fosso B, Picardi E, Tangaro S, Pesole G, Bellotti R. A primer on machine learning techniques for genomic applications. Comput Struct Biotechnol J 2021; 19:4345-4359. [PMID: 34429852 PMCID: PMC8365460 DOI: 10.1016/j.csbj.2021.07.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 07/23/2021] [Accepted: 07/23/2021] [Indexed: 11/28/2022] Open
Abstract
High throughput sequencing technologies have enabled the study of complex biological aspects at single nucleotide resolution, opening the big data era. The analysis of large volumes of heterogeneous "omic" data, however, requires novel and efficient computational algorithms based on the paradigm of Artificial Intelligence. In the present review, we introduce and describe the most common machine learning methodologies, and lately deep learning, applied to a variety of genomics tasks, trying to emphasize capabilities, strengths and limitations through a simple and intuitive language. We highlight the power of the machine learning approach in handling big data by means of a real life example, and underline how described methods could be relevant in all cases in which large amounts of multimodal genomic data are available.
Collapse
Affiliation(s)
- Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy
| | - Ester Pantaleo
- Dipartimento Interateneo di Fisica "M. Merlin", Università degli Studi di Bari "Aldo Moro", Via G. Amendola 173, 70125 Bari, Italy
| | - Nicola Amoroso
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento di Farmacia - Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Antonio Lacalamita
- National Institute of Gastroenterology "S. de Bellis", Research Hospital, 70013 Castellana Grotte (Bari), Italy
| | - Claudio Lo Giudice
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Adriano Fonzino
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Bruno Fosso
- Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Ernesto Picardi
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy.,Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Sabina Tangaro
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari "Aldo Moro", Bari, Via G. Amendola 165, 70125 Bari, Italy
| | - Graziano Pesole
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy.,Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento Interateneo di Fisica "M. Merlin", Università degli Studi di Bari "Aldo Moro", Via G. Amendola 173, 70125 Bari, Italy
| |
Collapse
|
10
|
Yang X, Wu C, Nenadic G, Wang W, Lu K. Mining a stroke knowledge graph from literature. BMC Bioinformatics 2021; 22:387. [PMID: 34325669 PMCID: PMC8319697 DOI: 10.1186/s12859-021-04292-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 07/06/2021] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Stroke has an acute onset and a high mortality rate, making it one of the most fatal diseases worldwide. Its underlying biology and treatments have been widely studied both in the "Western" biomedicine and the Traditional Chinese Medicine (TCM). However, these two approaches are often studied and reported in insolation, both in the literature and associated databases. RESULTS To aid research in finding effective prevention methods and treatments, we integrated knowledge from the literature and a number of databases (e.g. CID, TCMID, ETCM). We employed a suite of biomedical text mining (i.e. named-entity) approaches to identify mentions of genes, diseases, drugs, chemicals, symptoms, Chinese herbs and patent medicines, etc. in a large set of stroke papers from both biomedical and TCM domains. Then, using a combination of a rule-based approach with a pre-trained BioBERT model, we extracted and classified links and relationships among stroke-related entities as expressed in the literature. We construct StrokeKG, a knowledge graph includes almost 46 k nodes of nine types, and 157 k links of 30 types, connecting diseases, genes, symptoms, drugs, pathways, herbs, chemical, ingredients and patent medicine. CONCLUSIONS Our Stroke-KG can provide practical and reliable stroke-related knowledge to help with stroke-related research like exploring new directions for stroke research and ideas for drug repurposing and discovery. We make StrokeKG freely available at http://114.115.208.144:7474/browser/ (Please click "Connect" directly) and the source structured data for stroke at https://github.com/yangxi1016/Stroke.
Collapse
Affiliation(s)
- Xi Yang
- College of Computer, National University of Defence Technology, Changsha, 410073 China
- State Key Laboratory of High-Performance Computing, National University of Defence Technology, Changsha, 410073 China
- Department of Computer Science, University of Manchester, Manchester, M13 9PL UK
| | - Chengkun Wu
- State Key Laboratory of High-Performance Computing, National University of Defence Technology, Changsha, 410073 China
| | - Goran Nenadic
- Department of Computer Science, University of Manchester, Manchester, M13 9PL UK
| | - Wei Wang
- College of Computer, National University of Defence Technology, Changsha, 410073 China
| | - Kai Lu
- College of Computer, National University of Defence Technology, Changsha, 410073 China
| |
Collapse
|
11
|
Song B, Li F, Liu Y, Zeng X. Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison. Brief Bioinform 2021; 22:6326536. [PMID: 34308472 DOI: 10.1093/bib/bbab282] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 06/07/2021] [Accepted: 07/02/2021] [Indexed: 11/13/2022] Open
Abstract
The biomedical literature is growing rapidly, and the extraction of meaningful information from the large amount of literature is increasingly important. Biomedical named entity (BioNE) identification is one of the critical and fundamental tasks in biomedical text mining. Accurate identification of entities in the literature facilitates the performance of other tasks. Given that an end-to-end neural network can automatically extract features, several deep learning-based methods have been proposed for BioNE recognition (BioNER), yielding state-of-the-art performance. In this review, we comprehensively summarize deep learning-based methods for BioNER and datasets used in training and testing. The deep learning methods are classified into four categories: single neural network-based, multitask learning-based, transfer learning-based and hybrid model-based methods. They can be applied to BioNER in multiple domains, and the results are determined by the dataset size and type. Lastly, we discuss the future development and opportunities of BioNER methods.
Collapse
Affiliation(s)
- Bosheng Song
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| | - Fen Li
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| |
Collapse
|
12
|
Text Mining Gene Selection to Understand Pathological Phenotype Using Biological Big Data. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] Open
|
13
|
Neves M, Ševa J. An extensive review of tools for manual annotation of documents. Brief Bioinform 2021; 22:146-163. [PMID: 31838514 PMCID: PMC7820865 DOI: 10.1093/bib/bbz130] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms. Further, annotation tools are also used to extract new information for a particular use case. However, owing to the high number of existing annotation tools, finding the one that best fits particular needs is a demanding task that requires searching the scientific literature followed by installing and trying various tools. METHODS We searched for annotation tools and selected a subset of them according to five requirements with which they should comply, such as being Web-based or supporting the definition of a schema. We installed the selected tools (when necessary), carried out hands-on experiments and evaluated them using 26 criteria that covered functional and technical aspects. We defined each criterion on three levels of matches and a score for the final evaluation of the tools. RESULTS We evaluated 78 tools and selected the following 15 for a detailed evaluation: BioQRator, brat, Catma, Djangology, ezTag, FLAT, LightTag, MAT, MyMiner, PDFAnno, prodigy, tagtog, TextAE, WAT-SL and WebAnno. Full compliance with our 26 criteria ranged from only 9 up to 20 criteria, which demonstrated that some tools are comprehensive and mature enough to be used on most annotation projects. The highest score of 0.81 was obtained by WebAnno (of a maximum value of 1.0).
Collapse
Affiliation(s)
- Mariana Neves
- German Centre for the Protection of Laboratory Animals (BfR), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
| | - Jurica Ševa
- German Centre for the Protection of Laboratory Animals (BfR), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
| |
Collapse
|
14
|
Sharma B, Willis VC, Huettner CS, Beaty K, Snowdon JL, Xue S, South BR, Jackson GP, Weeraratne D, Michelini V. Predictive article recommendation using natural language processing and machine learning to support evidence updates in domain-specific knowledge graphs. JAMIA Open 2020; 3:332-337. [PMID: 33215067 PMCID: PMC7660962 DOI: 10.1093/jamiaopen/ooaa028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 05/26/2020] [Accepted: 06/19/2020] [Indexed: 11/14/2022] Open
Abstract
Objectives Describe an augmented intelligence approach to facilitate the update of evidence for associations in knowledge graphs. Methods New publications are filtered through multiple machine learning study classifiers, and filtered publications are combined with articles already included as evidence in the knowledge graph. The corpus is then subjected to named entity recognition, semantic dictionary mapping, term vector space modeling, pairwise similarity, and focal entity match to identify highly related publications. Subject matter experts review recommended articles to assess inclusion in the knowledge graph; discrepancies are resolved by consensus. Results Study classifiers achieved F-scores from 0.88 to 0.94, and similarity thresholds for each study type were determined by experimentation. Our approach reduces human literature review load by 99%, and over the past 12 months, 41% of recommendations were accepted to update the knowledge graph. Conclusion Integrated search and recommendation exploiting current evidence in a knowledge graph is useful for reducing human cognition load.
Collapse
Affiliation(s)
| | - Van C Willis
- IBM Watson Health, Cambridge, Massachusetts, USA
| | | | - Kirk Beaty
- IBM Watson Health, Cambridge, Massachusetts, USA
| | | | - Shang Xue
- IBM Watson Health, Cambridge, Massachusetts, USA
| | | | | | | | | |
Collapse
|
15
|
Bhardwaj P, Yadav RK, Kurian S. Digitizing the Pharma Neurons - A Technological Operation in Progress! Rev Recent Clin Trials 2020; 15:178-187. [PMID: 32564760 DOI: 10.2174/1574887115666200621183459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 04/27/2020] [Accepted: 05/22/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Digitization and automation are the buzzwords in clinical research and pharma companies are investigating heavily here. Right from drug discovery to personalized medicine, digital patients and patient engagement, there is great consideration of technology at each step. METHODS The published data and online information available is reviewed to give an overview of digitization in pharma, across the drug development cycle, industry collaborations and innovations. The regulatory guidelines, innovative collaborations across industry, academics and thought leadership are presented. Also included are some ideas, suggestions, way forwards while digitizing the pharma neurons, the regulatory stand, benefits and challenges. RESULTS The innovations range from discovering personalized medicine to conducting virtual clinical trials, and maximizing data collection from the real-world experience. To address the increasing demand for the real-world data and the needs of tech-savvy patients, the innovations are shaping up accordingly. Pharma companies are collaborating with academics and they are co-innovating the technology for example Massachusetts Institute of Technology's program. This focuses on the modernization of clinical trials, strategic use of artificial intelligence and machine learning using real-world evidence, assess the risk-benefit ratio of deploying digital analytics in medicine, and proactively identifying the solutions. CONCLUSION With unfolding data on the impact of science and technology amalgamation, we need shared mindset between data scientists and medical professionals to maximize the utility of enormous health and medical data. To tackle this efficiently, there is a need of cross-collaboration and education, and align with ethical and regulatory requirements. A perfect blend of industry, regulatory, and academia will ensure successful digitization of pharma neurons.
Collapse
Affiliation(s)
| | - Raj Kumar Yadav
- Integral Health Clinic, Department of Physiology, All India Institute of Medical Sciences, New Delhi-110029, India
| | - Sojan Kurian
- Tata Consultancy Services, New York, NY, United States
| |
Collapse
|
16
|
Association extraction from biomedical literature based on representation and transfer learning. J Theor Biol 2020; 488:110112. [DOI: 10.1016/j.jtbi.2019.110112] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 12/08/2019] [Indexed: 12/17/2022]
|
17
|
Bugnon LA, Yones C, Raad J, Gerard M, Rubiolo M, Merino G, Pividori M, Di Persia L, Milone DH, Stegmayer G. DL4papers: a deep learning approach for the automatic interpretation of scientific articles. Bioinformatics 2020; 36:3499-3506. [DOI: 10.1093/bioinformatics/btaa111] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Revised: 12/27/2019] [Accepted: 02/14/2020] [Indexed: 01/26/2023] Open
Abstract
Abstract
Motivation
In precision medicine, next-generation sequencing and novel preclinical reports have led to an increasingly large amount of results, published in the scientific literature. However, identifying novel treatments or predicting a drug response in, for example, cancer patients, from the huge amount of papers available remains a laborious and challenging work. This task can be considered a text mining problem that requires reading a lot of academic documents for identifying a small set of papers describing specific relations between key terms. Due to the infeasibility of the manual curation of these relations, computational methods that can automatically identify them from the available literature are urgently needed.
Results
We present DL4papers, a new method based on deep learning that is capable of analyzing and interpreting papers in order to automatically extract relevant relations between specific keywords. DL4papers receives as input a query with the desired keywords, and it returns a ranked list of papers that contain meaningful associations between the keywords. The comparison against related methods showed that our proposal outperformed them in a cancer corpus. The reliability of the DL4papers output list was also measured, revealing that 100% of the first two documents retrieved for a particular search have relevant relations, in average. This shows that our model can guarantee that in the top-2 papers of the ranked list, the relation can be effectively found. Furthermore, the model is capable of highlighting, within each document, the specific fragments that have the associations of the input keywords. This can be very useful in order to pay attention only to the highlighted text, instead of reading the full paper. We believe that our proposal could be used as an accurate tool for rapidly identifying relationships between genes and their mutations, drug responses and treatments in the context of a certain disease. This new approach can certainly be a very useful and valuable resource for the advancement of the precision medicine field.
Availability and implementation
A web-demo is available at: http://sinc.unl.edu.ar/web-demo/dl4papers/. Full source code and data are available at: https://sourceforge.net/projects/sourcesinc/files/dl4papers/.
Contact
lbugnon@sinc.unl.edu.ar
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- L A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe 3000, Argentina
| | - C Yones
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe 3000, Argentina
| | - J Raad
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe 3000, Argentina
| | - M Gerard
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe 3000, Argentina
| | - M Rubiolo
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe 3000, Argentina
| | - G Merino
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe 3000, Argentina
- Bioengineering and Bioinformatics Research and Development Institute, IBB, FIUNER-CONICET, Ruta Prov 11, Km 10.5, Oro Verde 3100, Argentina
| | - M Pividori
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe 3000, Argentina
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - L Di Persia
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe 3000, Argentina
| | - D H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe 3000, Argentina
| | - G Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH/UNL-CONICET, Ciudad Universitaria, Santa Fe 3000, Argentina
| |
Collapse
|
18
|
Legrand J, Gogdemir R, Bousquet C, Dalleau K, Devignes MD, Digan W, Lee CJ, Ndiaye NC, Petitpain N, Ringot P, Smaïl-Tabbone M, Toussaint Y, Coulet A. PGxCorpus, a manually annotated corpus for pharmacogenomics. Sci Data 2020; 7:3. [PMID: 31896797 PMCID: PMC6940385 DOI: 10.1038/s41597-019-0342-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 12/02/2019] [Indexed: 11/09/2022] Open
Abstract
Pharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes PGx-related knowledge a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly reusable by humans or software. Natural language processing techniques have been developed to guide experts who curate this amount of knowledge. But existing works are limited by the absence of a high quality annotated corpus focusing on PGx domain. In particular, this absence restricts the use of supervised machine learning. This article introduces PGxCorpus, a manually annotated corpus, designed to fill this gap and to enable the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly gene variations, genes, drugs and phenotypes), and relationships between those. In this article, we present the corpus itself, its construction and a baseline experiment that illustrates how it may be leveraged to synthesize and summarize PGx knowledge.
Collapse
Affiliation(s)
- Joël Legrand
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, France.
| | | | - Cédric Bousquet
- Sorbonne Université, INSERM, Université Paris 13, LIMICS, Paris, France
| | - Kevin Dalleau
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, France
| | | | - William Digan
- Hôpital Européen Georges Pompidou, AP-HP, Université Paris Descartes, Université Sorbonne Paris Cité, Paris, France
- INSERM UMR 1138 Equipe 22, Université Paris Descartes, Université Sorbonne Paris Cité, Paris, France
| | - Chia-Ju Lee
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| | | | - Nadine Petitpain
- Centre Régional de Pharmacovigilance, CHRU of Nancy, Nancy, France
| | - Patrice Ringot
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, France
| | | | | | - Adrien Coulet
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, France
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA
| |
Collapse
|
19
|
Chang B, Choi Y, Jeon M, Lee J, Han KM, Kim A, Ham BJ, Kang J. ARPNet: Antidepressant Response Prediction Network for Major Depressive Disorder. Genes (Basel) 2019; 10:genes10110907. [PMID: 31703457 PMCID: PMC6895829 DOI: 10.3390/genes10110907] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Revised: 10/25/2019] [Accepted: 10/29/2019] [Indexed: 12/20/2022] Open
Abstract
Treating patients with major depressive disorder is challenging because it takes several months for antidepressants prescribed for the patients to take effect. This limitation may result in increased risks and treatment costs. To address this limitation, an accurate antidepressant response prediction model is needed. Recently, several studies have proposed models that extract useful features such as neuroimaging biomarkers and genetic variants from patient data, and use them as predictors for predicting the antidepressant responses of patients. However, it is impossible to utilize all the different types of predictors when making a clinical decision on what drugs to prescribe for a patient. Although a machine learning-based antidepressant response prediction model has been proposed to overcome this problem, the model cannot find the most effective antidepressant for a patient. Based on a neural network, we propose an Antidepressant Response Prediction Network (ARPNet) model capturing high-dimensional patterns from useful features. Based on a literature survey and data-driven feature selection, we extract useful features from patient data, and use the features as predictors. In ARPNet, the patient representation layer captures patient features and the antidepressant prescription representation layer captures antidepressant features. Utilizing the patient and antidepressant prescription representation vectors, ARPNet predicts the degree of antidepressant response. The experimental evaluation results demonstrate that our proposed ARPNet model outperforms machine learning-based models in predicting antidepressant response. Moreover, we demonstrate the applicability of ARPNet in downstream applications in use case scenarios.
Collapse
Affiliation(s)
- Buru Chang
- Department of Computer Science and Engineering, Korea University, Seoul 02841, Korea; (B.C.); (Y.C.); (M.J.); (J.L.)
| | - Yonghwa Choi
- Department of Computer Science and Engineering, Korea University, Seoul 02841, Korea; (B.C.); (Y.C.); (M.J.); (J.L.)
| | - Minji Jeon
- Department of Computer Science and Engineering, Korea University, Seoul 02841, Korea; (B.C.); (Y.C.); (M.J.); (J.L.)
| | - Junhyun Lee
- Department of Computer Science and Engineering, Korea University, Seoul 02841, Korea; (B.C.); (Y.C.); (M.J.); (J.L.)
| | - Kyu-Man Han
- Department of Psychiatry, Korea University Anam Hospital, Korea University College of Medicine, Seoul 02841, Korea;
| | - Aram Kim
- Department of Biomedical Sciences, Korea University College of Medicine, Seoul 02841, Korea;
| | - Byung-Joo Ham
- Department of Psychiatry, Korea University Anam Hospital, Korea University College of Medicine, Seoul 02841, Korea;
- Brain Convergence Research Center, Korea University Anam Hospital, Seoul 02841, Korea
- Correspondence: (B.-J.H.); (J.K.)
| | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul 02841, Korea; (B.C.); (Y.C.); (M.J.); (J.L.)
- Interdisciplinary Graduate Program in Bioinformatics, Korea University, Seoul 02841, Korea
- Correspondence: (B.-J.H.); (J.K.)
| |
Collapse
|
20
|
Tolios A, De Las Rivas J, Hovig E, Trouillas P, Scorilas A, Mohr T. Computational approaches in cancer multidrug resistance research: Identification of potential biomarkers, drug targets and drug-target interactions. Drug Resist Updat 2019; 48:100662. [PMID: 31927437 DOI: 10.1016/j.drup.2019.100662] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2019] [Revised: 10/15/2019] [Accepted: 10/17/2019] [Indexed: 02/07/2023]
Abstract
Like physics in the 19th century, biology and molecular biology in particular, has been fertilized and enhanced like few other scientific fields, by the incorporation of mathematical methods. In the last decades, a whole new scientific field, bioinformatics, has developed with an output of over 30,000 papers a year (Pubmed search using the keyword "bioinformatics"). Huge databases of mass throughput data have been established, with ArrayExpress alone containing more than 2.7 million assays (October 2019). Computational methods have become indispensable tools in molecular biology, particularly in one of the most challenging areas of cancer research, multidrug resistance (MDR). However, confronted with a plethora of different algorithms, approaches, and methods, the average researcher faces key questions: Which methods do exist? Which methods can be used to tackle the aims of a given study? Or, more generally, how do I use computational biology/bioinformatics to bolster my research? The current review is aimed at providing guidance to existing methods with relevance to MDR research. In particular, we provide an overview on: a) the identification of potential biomarkers using expression data; b) the prediction of treatment response by machine learning methods; c) the employment of network approaches to identify gene/protein regulatory networks and potential key players; d) the identification of drug-target interactions; e) the use of bipartite networks to identify multidrug targets; f) the identification of cellular subpopulations with the MDR phenotype; and, finally, g) the use of molecular modeling methods to guide and enhance drug discovery. This review shall serve as a guide through some of the basic concepts useful in MDR research. It shall give the reader some ideas about the possibilities in MDR research by using computational tools, and, finally, it shall provide a short overview of relevant literature.
Collapse
Affiliation(s)
- A Tolios
- Department of Blood Group Serology and Transfusion Medicine, Medical University of Vienna, Vienna, Austria; Department of Laboratory Medicine, Medical University of Vienna, Vienna, Austria; Institute of Clinical Chemistry and Laboratory Medicine, Heinrich Heine University, Duesseldorf, Germany.
| | - J De Las Rivas
- Bioinformatics and Functional Genomics Group, Cancer Research Center (CiC-IMBCC, CSIC/USAL/IBSAL), Consejo Superior de Investigaciones Científicas (CSIC) and University of Salamanca (USAL), Campus Miguel de Unamuno s/n, Salamanca, Spain.
| | - E Hovig
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital and Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway.
| | - P Trouillas
- UMR 1248 INSERM, Univ. Limoges, 2 rue du Dr Marland, 87052, Limoges, France; RCPTM, University Palacký of Olomouc, tr. 17. listopadu 12, 771 46, Olomouc, Czech Republic.
| | - A Scorilas
- Department of Biochemistry & Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece.
| | - T Mohr
- Institute of Cancer Research, Department of Medicine I, Medical University of Vienna, Vienna, Austria; ScienceConsult - DI Thomas Mohr KG, Guntramsdorf, Austria.
| |
Collapse
|
21
|
Smaïl-Tabbone M, Rance B. Contributions from the 2018 Literature on Bioinformatics and Translational Informatics. Yearb Med Inform 2019; 28:190-193. [PMID: 31419831 PMCID: PMC6697500 DOI: 10.1055/s-0039-1677945] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Objectives
: To summarize recent research and select the best papers published in 2018 in the field of Bioinformatics and Translational Informatics (BTI) for the corresponding section of the International Medical Informatics Association (IMIA) Yearbook.
Methods
: A literature review was performed for retrieving from PubMed papers indexed with keywords and free terms related to BTI. Independent review allowed the two section editors to select a list of 14 candidate best papers which were subsequently peer-reviewed. A final consensus meeting gathering the whole IMIA Yearbook editorial committee was organized to finally decide on the selection of the best papers.
Results
: Among the 636 retrieved papers published in 2018 in the various subareas of BTI, the review process selected four best papers. The first paper presents a computational method to identify molecular markers for targeted treatment of acute myeloid leukemia using multi-omics data (genome-wide gene expression profiles) and in vitro sensitivity to 160 chemotherapy drugs. The second paper describes a deep neural network approach to predict the survival of patients suffering from glioma on the basis of digitalised pathology images and genomics biomarkers. The authors of the third paper adopt a pan-cancer approach to take benefit of multi-omics data for drug repurposing. The fourth paper presents a graph-based semi-supervised method to accurate phenotype classification applied to ovarian cancer.
Conclusions
: Thanks to the normalization of open data and open science practices, research in BTI continues to develop and mature. Noteworthy achievements are sophisticated applications of leading edge machine-learning methods dedicated to personalized medicine.
Collapse
Affiliation(s)
- Malika Smaïl-Tabbone
- Loria UMR 7503, Université de Lorraine, CNRS, Inria Nancy Grand-Est, Nancy, France
| | - Bastien Rance
- HEGP, AP-HP; Université Paris Descartes, Université de Paris; UMRS 1138 Centre de Recherche des Cordeliers INSERM, Paris, France
| | | |
Collapse
|
22
|
Gachloo M, Wang Y, Xia J. A review of drug knowledge discovery using BioNLP and tensor or matrix decomposition. Genomics Inform 2019; 17:e18. [PMID: 31307133 PMCID: PMC6808632 DOI: 10.5808/gi.2019.17.2.e18] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 05/30/2019] [Accepted: 05/30/2019] [Indexed: 12/12/2022] Open
Abstract
Prediction of the relations among drug and other molecular or social entities is the main knowledge discovery pattern for the purpose of drug-related knowledge discovery. Computational approaches have combined the information from different sources and levels for drug-related knowledge discovery, which provides a sophisticated comprehension of the relationship among drugs, targets, diseases, and targeted genes, at the molecular level, or relationships among drugs, usage, side effect, safety, and user preference, at a social level. In this research, previous work from the BioNLP community and matrix or matrix decomposition was reviewed, compared, and concluded, and eventually, the BioNLP open-shared task was introduced as a promising case study representing this area.
Collapse
Affiliation(s)
- Mina Gachloo
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuxing Wang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jingbo Xia
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
23
|
Gruson D, Helleputte T, Rousseau P, Gruson D. Data science, artificial intelligence, and machine learning: Opportunities for laboratory medicine and the value of positive regulation. Clin Biochem 2019; 69:1-7. [PMID: 31022391 DOI: 10.1016/j.clinbiochem.2019.04.013] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 04/17/2019] [Accepted: 04/19/2019] [Indexed: 12/16/2022]
Abstract
Artificial intelligence (AI) and data science are rapidly developing in healthcare, as is their translation into laboratory medicine. Our review article presents an overview of the data science domain while discussing the reasons for its emergence. We also present several perspectives of its applications in clinical laboratories, along with potential ethical challenges related to AI and data science.
Collapse
Affiliation(s)
- Damien Gruson
- Department of Laboratory Medicine, Cliniques Universitaires St-Luc and Université Catholique de Louvain, Brussels, Belgium; Pôle de recherche en Endocrinologie, Diabète et Nutrition, Institut de Recherche Expérimentale et Clinique, Cliniques Universitaires St-Luc and Université Catholique de Louvain, Brussels, Belgium.
| | | | - Patrick Rousseau
- Department of Laboratory Medicine, Cliniques Universitaires St-Luc and Université Catholique de Louvain, Brussels, Belgium
| | - David Gruson
- Genetics Regulation for Paris Descartes-University, Paris, France
| |
Collapse
|
24
|
Xu J, Yang P, Xue S, Sharma B, Sanchez-Martin M, Wang F, Beaty KA, Dehan E, Parikh B. Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives. Hum Genet 2019; 138:109-124. [PMID: 30671672 PMCID: PMC6373233 DOI: 10.1007/s00439-019-01970-5] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 01/02/2019] [Indexed: 02/07/2023]
Abstract
In the field of cancer genomics, the broad availability of genetic information offered by next-generation sequencing technologies and rapid growth in biomedical publication has led to the advent of the big-data era. Integration of artificial intelligence (AI) approaches such as machine learning, deep learning, and natural language processing (NLP) to tackle the challenges of scalability and high dimensionality of data and to transform big data into clinically actionable knowledge is expanding and becoming the foundation of precision medicine. In this paper, we review the current status and future directions of AI application in cancer genomics within the context of workflows to integrate genomic analysis for precision cancer care. The existing solutions of AI and their limitations in cancer genetic testing and diagnostics such as variant calling and interpretation are critically analyzed. Publicly available tools or algorithms for key NLP technologies in the literature mining for evidence-based clinical recommendations are reviewed and compared. In addition, the present paper highlights the challenges to AI adoption in digital healthcare with regard to data requirements, algorithmic transparency, reproducibility, and real-world assessment, and discusses the importance of preparing patients and physicians for modern digitized healthcare. We believe that AI will remain the main driver to healthcare transformation toward precision medicine, yet the unprecedented challenges posed should be addressed to ensure safety and beneficial impact to healthcare.
Collapse
Affiliation(s)
- Jia Xu
- IBM Watson Health, Cambridge, MA, USA.
| | | | - Shang Xue
- IBM Watson Health, Cambridge, MA, USA
| | | | | | - Fang Wang
- IBM Watson Health, Cambridge, MA, USA
| | | | | | | |
Collapse
|
25
|
Lee K, Famiglietti ML, McMahon A, Wei CH, MacArthur JAL, Poux S, Breuza L, Bridge A, Cunningham F, Xenarios I, Lu Z. Scaling up data curation using deep learning: An application to literature triage in genomic variation resources. PLoS Comput Biol 2018; 14:e1006390. [PMID: 30102703 PMCID: PMC6107285 DOI: 10.1371/journal.pcbi.1006390] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 08/23/2018] [Accepted: 07/24/2018] [Indexed: 11/18/2022] Open
Abstract
Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases. As the volume of literature on genomic variants continues to grow at an increasing rate, it is becoming more difficult for a curator of a variant knowledge base to keep up with and curate all the published papers. Here, we suggest a deep learning-based literature triage method for genomic variation resources. Our method achieves state-of-the-art performance on the triage task. Moreover, our model does not require any laborious preprocessing or feature engineering steps, which are required for traditional machine learning triage methods. We applied our method to the literature triage process of UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog for genomic variation by collaborating with the database curators. Both the manual curation teams confirmed that our method achieved higher precision than their previous query-based triage methods without compromising recall. Both results show that our method is more efficient and can replace the traditional query-based triage methods of manually curated databases. Our method can give human curators more time to focus on more challenging tasks such as actual curation as well as the discovery of novel papers/experimental techniques to consider for inclusion.
Collapse
Affiliation(s)
- Kyubum Lee
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | | | - Aoife McMahon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Jacqueline Ann Langdon MacArthur
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Sylvain Poux
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Lionel Breuza
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Alan Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ioannis Xenarios
- Center for Integrative Genomics, University of Lausanne, Lausanne Switzerland.,Department of Chemistry and Biochemistry, University of Geneva, Geneva, Switzerland
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| |
Collapse
|