1
|
Neves M, Ševa J. An extensive review of tools for manual annotation of documents. Brief Bioinform 2021; 22:146-163. [PMID: 31838514 PMCID: PMC7820865 DOI: 10.1093/bib/bbz130] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms. Further, annotation tools are also used to extract new information for a particular use case. However, owing to the high number of existing annotation tools, finding the one that best fits particular needs is a demanding task that requires searching the scientific literature followed by installing and trying various tools. METHODS We searched for annotation tools and selected a subset of them according to five requirements with which they should comply, such as being Web-based or supporting the definition of a schema. We installed the selected tools (when necessary), carried out hands-on experiments and evaluated them using 26 criteria that covered functional and technical aspects. We defined each criterion on three levels of matches and a score for the final evaluation of the tools. RESULTS We evaluated 78 tools and selected the following 15 for a detailed evaluation: BioQRator, brat, Catma, Djangology, ezTag, FLAT, LightTag, MAT, MyMiner, PDFAnno, prodigy, tagtog, TextAE, WAT-SL and WebAnno. Full compliance with our 26 criteria ranged from only 9 up to 20 criteria, which demonstrated that some tools are comprehensive and mature enough to be used on most annotation projects. The highest score of 0.81 was obtained by WebAnno (of a maximum value of 1.0).
Collapse
Affiliation(s)
- Mariana Neves
- German Centre for the Protection of Laboratory Animals (BfR), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
| | - Jurica Ševa
- German Centre for the Protection of Laboratory Animals (BfR), German Federal Institute for Risk Assessment (BfR), Berlin, Germany
| |
Collapse
|
2
|
Islamaj R, Kwon D, Kim S, Lu Z. TeamTat: a collaborative text annotation tool. Nucleic Acids Res 2020; 48:W5-W11. [PMID: 32383756 DOI: 10.1093/nar/gkaa333] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 04/16/2020] [Accepted: 04/22/2020] [Indexed: 12/20/2022] Open
Abstract
Manually annotated data is key to developing text-mining and information-extraction algorithms. However, human annotation requires considerable time, effort and expertise. Given the rapid growth of biomedical literature, it is paramount to build tools that facilitate speed and maintain expert quality. While existing text annotation tools may provide user-friendly interfaces to domain experts, limited support is available for figure display, project management, and multi-user team annotation. In response, we developed TeamTat (https://www.teamtat.org), a web-based annotation tool (local setup available), equipped to manage team annotation projects engagingly and efficiently. TeamTat is a novel tool for managing multi-user, multi-label document annotation, reflecting the entire production life cycle. Project managers can specify annotation schema for entities and relations and select annotator(s) and distribute documents anonymously to prevent bias. Document input format can be plain text, PDF or BioC (uploaded locally or automatically retrieved from PubMed/PMC), and output format is BioC with inline annotations. TeamTat displays figures from the full text for the annotator's convenience. Multiple users can work on the same document independently in their workspaces, and the team manager can track task completion. TeamTat provides corpus quality assessment via inter-annotator agreement statistics, and a user-friendly interface convenient for annotation review and inter-annotator disagreement resolution to improve corpus quality.
Collapse
Affiliation(s)
- Rezarta Islamaj
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Dongseop Kwon
- School of Software Convergence, Myongji University, Seoul 03674, South Korea
| | - Sun Kim
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA
| |
Collapse
|
3
|
Xue X, Lu J, Chen J. Using NSGA‐III for optimising biomedical ontology alignment. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2019. [DOI: 10.1049/trit.2019.0014] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Xingsi Xue
- College of Information Science and EngineeringFujian University of TechnologyFuzhouFujianPeople's Republic of China
- Intelligent Information Processing Research Center, Fujian University of TechnologyFuzhouFujianPeople's Republic of China
- Fujian Provincial Key Laboratory of Big Data Mining and ApplicationsFujian University of TechnologyFuzhouFujianPeople's Republic of China
- Fujian Key Laboratory for Automotive Electronics and Electric DriveFujian University of TechnologyFuzhouFujianPeople's Republic of China
| | - Jiawei Lu
- College of Information Science and EngineeringFujian University of TechnologyFuzhouFujianPeople's Republic of China
- Intelligent Information Processing Research Center, Fujian University of TechnologyFuzhouFujianPeople's Republic of China
| | - Junfeng Chen
- College of IOT Engineering, Hohai UniversityChangzhouJiangsuPeople's Republic of China
| |
Collapse
|
4
|
Abstract
Due to continuous evolution of biomedical data, biomedical ontologies are becoming larger and more complex, which leads to the existence of many overlapping information. To support semantic inter-operability between ontology-based biomedical systems, it is necessary to identify the correspondences between these information, which is commonly known as biomedical ontology matching. However, it is a challenge to match biomedical ontologies, which dues to: (1) biomedical ontologies often possess tens of thousands of entities, (2) biomedical terminologies are complex and ambiguous. To efficiently match biomedical ontologies, in this paper, an interactive biomedical ontology matching approach is proposed, which utilizes the Evolutionary Algorithm (EA) to implement the automatic matching process, and gets a user involved in the evolving process to improve the matching efficiency. In particular, we propose an Evolutionary Tabu Search (ETS) algorithm, which can improve EA's performance by introducing the tabu search algorithm as a local search strategy into the evolving process. On this basis, we further make the ETS-based ontology matching technique cooperate with the user in a reasonable amount of time to efficiently create high quality alignments, and make use of EA's survival of the fittest to eliminate the wrong correspondences brought by erroneous user validations. The experiment is conducted on the Anatomy track and Large Biomedic track that are provided by the Ontology Alignment Evaluation Initiative (OAEI), and the experimental results show that our approach is able to efficiently exploit the user intervention to improve its non-interactive version, and the performance of our approach outperforms the state-of-the-art semi-automatic ontology matching systems.
Collapse
Affiliation(s)
- Xingsi Xue
- College of Information Science and Engineering, Fujian University of Technology, Fuzhou, Fujian, China
- Intelligent Information Processing Research Center, Fujian University of Technology, Fuzhou, Fujian, China
- Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, Fujian, China
- Fujian Key Lab for Automotive Electronics and Electric Drive, Fujian University of Technology, Fuzhou, Fujian, China
- * E-mail:
| | - Zhi Hang
- Key Laboratory of Hunan Province for Mobile Business Intelligence, Hunan University of Commerce, Changsha, China
| | - Zhengyi Tang
- College of Information Science and Engineering, Fujian University of Technology, Fuzhou, Fujian, China
| |
Collapse
|
5
|
López-Fernández H, Blanco-Míguez A, Fdez-Riverola F, Sánchez B, Lourenço A. DEWE: A novel tool for executing differential expression RNA-Seq workflows in biomedical research. Comput Biol Med 2019; 107:197-205. [PMID: 30849608 DOI: 10.1016/j.compbiomed.2019.02.021] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 02/21/2019] [Accepted: 02/21/2019] [Indexed: 01/31/2023]
Abstract
BACKGROUND Transcriptomics profiling aims to identify and quantify all transcripts present within a cell type or tissue at a particular state, and thus provide information on the genes expressed in specific experimental settings, differentiation or disease conditions. RNA-Seq technology is becoming the standard approach for such studies, but available analysis tools are often hard to install, configure and use by users without advanced bioinformatics skills. METHODS Within reason, DEWE aims to make RNA-Seq analysis as easy for non-proficient users as for experienced bioinformaticians. DEWE supports two well-established and widely used differential expression analysis workflows: using Bowtie2 or HISAT2 for sequence alignment; and, both applying StringTie for quantification, and Ballgown and edgeR for differential expression analysis. Also, it enables the tailored execution of individual tools as well as helps with the management and visualisation of differential expression results. RESULTS DEWE provides a user-friendly interface designed to reduce the learning curve of less knowledgeable users while enabling analysis customisation and software extension by advanced users. Docker technology helps overcome installation and configuration hurdles. In addition, DEWE produces high quality and publication-ready outputs in the form of tab-delimited files and figures, as well as helps researchers with further analyses, such as pathway enrichment analysis. CONCLUSIONS The abilities of DEWE are exemplified here by practical application to a comparative analysis of monocytes and monocyte-derived dendritic cells, a study of clinical relevance. DEWE installers and documentation are freely available at https://www.sing-group.org/dewe.
Collapse
Affiliation(s)
- Hugo López-Fernández
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain; SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Hospital Álvaro Cunqueiro, 36312, Vigo, Spain; Universidade do Porto, Rua Alfredo Allen, 208, 4200-135, Porto, Portugal; Instituto de Biologia Molecular e Celular (IBMC), Rúa Alfredo Allen, 208, 4200-135, Porto, Portugal
| | - Aitor Blanco-Míguez
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain; Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA), Consejo Superior de Investigaciones Científicas (CSIC), Paseo Río Linares s/n, 33300, Villaviciosa, Asturias, Spain
| | - Florentino Fdez-Riverola
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain; SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Hospital Álvaro Cunqueiro, 36312, Vigo, Spain
| | - Borja Sánchez
- Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA), Consejo Superior de Investigaciones Científicas (CSIC), Paseo Río Linares s/n, 33300, Villaviciosa, Asturias, Spain
| | - Anália Lourenço
- ESEI: Escuela Superior de Ingeniería Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310, Vigo, Spain; SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Hospital Álvaro Cunqueiro, 36312, Vigo, Spain; CEB - Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal.
| |
Collapse
|
6
|
Using Compact Coevolutionary Algorithm for Matching Biomedical Ontologies. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2018; 2018:2309587. [PMID: 30405706 PMCID: PMC6199880 DOI: 10.1155/2018/2309587] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 08/30/2018] [Indexed: 11/17/2022]
Abstract
Over the recent years, ontologies are widely used in various domains such as medical records annotation, medical knowledge representation and sharing, clinical guideline management, and medical decision-making. To implement the cooperation between intelligent applications based on biomedical ontologies, it is crucial to establish correspondences between the heterogeneous biomedical concepts in different ontologies, which is so-called biomedical ontology matching. Although Evolutionary algorithms (EAs) are one of the state-of-the-art methodologies to match the heterogeneous ontologies, huge memory consumption, long runtime, and the bias improvement of the solutions hamper them from efficiently matching biomedical ontologies. To overcome these shortcomings, we propose a compact CoEvolutionary Algorithm to efficiently match the biomedical ontologies. Particularly, a compact EA with local search strategy is able to save the memory consumption and runtime, and three subswarms with different optimal objectives can help one another to avoid the solution's bias improvement. In the experiment, two famous testing cases provided by Ontology Alignment Evaluation Initiative (OAEI 2017), i.e. anatomy track and large biomed track, are utilized to test our approach's performance. The experimental results show the effectiveness of our proposal.
Collapse
|
7
|
López-Fernández H, Reboiro-Jato M, Glez-Peña D, Laza R, Pavón R, Fdez-Riverola F. GC4S: A bioinformatics-oriented Java software library of reusable graphical user interface components. PLoS One 2018; 13:e0204474. [PMID: 30235322 PMCID: PMC6147514 DOI: 10.1371/journal.pone.0204474] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 09/07/2018] [Indexed: 01/22/2023] Open
Abstract
Modern bioinformatics and computational biology are fields of study driven by the availability of effective software required for conducting appropriate research tasks. Apart from providing reliable and fast implementations of different data analysis algorithms, these software applications should also be clear and easy to use through proper user interfaces, providing appropriate data management and visualization capabilities. In this regard, the user experience obtained by interacting with these applications via their Graphical User Interfaces (GUI) is a key factor for their final success and real utility for researchers. Despite the existence of different packages and applications focused on advanced data visualization, there is a lack of specific libraries providing pertinent GUI components able to help scientific bioinformatics software developers. To that end, this paper introduces GC4S, a bioinformatics-oriented collection of high-level, extensible, and reusable Java GUI elements specifically designed to speed up bioinformatics software development. Within GC4S, developers of new applications can focus on the specific GUI requirements of their projects, relying on GC4S for generalities and abstractions. GC4S is free software distributed under the terms of GNU Lesser General Public License and both source code and documentation are publicly available at http://www.sing-group.org/gc4s.
Collapse
Affiliation(s)
- Hugo López-Fernández
- ESEI—Escuela Superior de Ingeniería Informática, Universidad de Vigo, Ourense, Spain
- CINBIO—Centro de Investigaciones Biomédicas, Universidad de Vigo, Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
- Instituto de Investigação e Inovação em Saúde (I3S), Universidade do Porto, Porto, Portugal
- Instituto de Biologia Molecular e Celular (IBMC), Porto, Portugal
| | - Miguel Reboiro-Jato
- ESEI—Escuela Superior de Ingeniería Informática, Universidad de Vigo, Ourense, Spain
- CINBIO—Centro de Investigaciones Biomédicas, Universidad de Vigo, Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
| | - Daniel Glez-Peña
- ESEI—Escuela Superior de Ingeniería Informática, Universidad de Vigo, Ourense, Spain
- CINBIO—Centro de Investigaciones Biomédicas, Universidad de Vigo, Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
| | - Rosalía Laza
- ESEI—Escuela Superior de Ingeniería Informática, Universidad de Vigo, Ourense, Spain
- CINBIO—Centro de Investigaciones Biomédicas, Universidad de Vigo, Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
| | - Reyes Pavón
- ESEI—Escuela Superior de Ingeniería Informática, Universidad de Vigo, Ourense, Spain
- CINBIO—Centro de Investigaciones Biomédicas, Universidad de Vigo, Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
| | - Florentino Fdez-Riverola
- ESEI—Escuela Superior de Ingeniería Informática, Universidad de Vigo, Ourense, Spain
- CINBIO—Centro de Investigaciones Biomédicas, Universidad de Vigo, Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
| |
Collapse
|
8
|
Perceptions of the use of intelligent information access systems in university level active learning activities among teachers of biomedical subjects. Int J Med Inform 2018; 112:21-33. [PMID: 29500018 DOI: 10.1016/j.ijmedinf.2017.12.016] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 12/19/2017] [Accepted: 12/21/2017] [Indexed: 11/23/2022]
Abstract
BACKGROUND Student participation and the use of active methodologies in classroom learning are being increasingly emphasized. The use of intelligent systems can be of great help when designing and developing these types of activities. Recently, emerging disciplines such as 'educational data mining' and 'learning analytics and knowledge' have provided clear examples of the importance of the use of artificial intelligence techniques in education. OBJECTIVE The main objective of this study was to gather expert opinions regarding the benefits of using complementary methods that are supported by intelligent systems, specifically, by intelligent information access systems, when processing texts written in natural language and the benefits of using these methods as companion tools to the learning activities that are employed by biomedical and health sciences teachers. METHODS Eleven teachers of degree courses who belonged to the Faculties of Biomedical Sciences (BS) and Health Sciences (HS) of a Spanish university in Madrid were individually interviewed. These interviews were conducted using a mixed methods questionnaire that included 66 predefined close-ended and open-ended questions. In our study, three intelligent information access systems (i.e., BioAnnote, CLEiM and MedCMap) were successfully used to evaluate the teacher's perceptions regarding the utility of these systems and their different methods in learning activities. RESULTS All teachers reported using active learning methods in the classroom, most of which were computer programs that were used for initially designing and later executing learning activities. All teachers used case-based learning methods in the classroom, with a specific emphasis on case reports written in Spanish and/or English. In general, few or none of the teachers were familiar with the technical terms related to the technologies used for these activities such as "intelligent systems" or "concept/mental maps". However, they clearly realized the potential applicability of such approaches in both the preparation and the effective use of these activities in the classroom. Specifically, the themes highlighted by a greater number of teachers after analyzing the responses to the open-ended questions were the usefulness of BioAnnote system to provide reliable sources of medical information and the usefulness of the bilingual nature of CLEiM system for learning medical terminology in English. CONCLUSIONS Three intelligent information access systems were successfully used to evaluate the teacher's perceptions regarding the utility of these systems in learning activities. The results of this study showed that integration of reliable sources of information, bilingualism and selective annotation of concepts were the most valued features by the teachers, who also considered the incorporation of these systems into learning activities to be potentially very useful. In addition, in the context of our experimental conditions, our work provides useful insights into the way to appropriately integrate this type of intelligent information access systems into learning activities, revealing key themes to consider when developing such approaches.
Collapse
|
9
|
López-Fernández H, Araújo JE, Jorge S, Glez-Peña D, Reboiro-Jato M, Santos HM, Fdez-Riverola F, Capelo JL. S2P: A software tool to quickly carry out reproducible biomedical research projects involving 2D-gel and MALDI-TOF MS protein data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 155:1-9. [PMID: 29512488 DOI: 10.1016/j.cmpb.2017.11.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 11/21/2017] [Accepted: 11/24/2017] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE 2D-gel electrophoresis is widely used in combination with MALDI-TOF mass spectrometry in order to analyze the proteome of biological samples. For instance, it can be used to discover proteins that are differentially expressed between two groups (e.g. two disease conditions, case vs. control, etc.) thus obtaining a set of potential biomarkers. This procedure requires a great deal of data processing in order to prepare data for analysis or to merge and integrate data from different sources. This kind of work is usually done manually (e.g. copying and pasting data into spreadsheet files), which is highly time consuming and distracts the researcher from other important, core tasks. Moreover, engaging in a repetitive process in a non-automated, handling-based manner is prone to error, thus threatening reliability and reproducibility. The objective of this paper is to present S2P, an open source software to overcome these drawbacks. METHODS S2P is implemented in Java on top of the AIBench framework, and relies on well-established open source libraries to accomplish different tasks. RESULTS S2P is an AIBench based desktop multiplatform application, specifically aimed to process 2D-gel and MALDI-mass spectrometry protein identification-based data in a computer-aided, reproducible manner. Different case studies are presented in order to show the usefulness of S2P. CONCLUSIONS S2P is open source and free to all users at http://www.sing-group.org/s2p. Through its user-friendly GUI interface, S2P dramatically reduces the time that researchers need to invest in order to prepare data for analysis.
Collapse
Affiliation(s)
- Hugo López-Fernández
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain; UCIBIO-REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Portugal.
| | - José E Araújo
- UCIBIO-REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Portugal; ProteoMass Scientific Society, Madan Parque, Rua dos Inventores, 2825-182 Caparica, Portugal
| | - Susana Jorge
- UCIBIO-REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Portugal; ProteoMass Scientific Society, Madan Parque, Rua dos Inventores, 2825-182 Caparica, Portugal
| | - Daniel Glez-Peña
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain
| | - Miguel Reboiro-Jato
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain
| | - Hugo M Santos
- UCIBIO-REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Portugal; ProteoMass Scientific Society, Madan Parque, Rua dos Inventores, 2825-182 Caparica, Portugal
| | - Florentino Fdez-Riverola
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain; CINBIO - Centro de Investigaciones Biomédicas, University of Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain
| | - José L Capelo
- UCIBIO-REQUIMTE, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Portugal; ProteoMass Scientific Society, Madan Parque, Rua dos Inventores, 2825-182 Caparica, Portugal
| |
Collapse
|
10
|
Jiang J, Xie J, Zhao C, Su J, Guan Y, Yu Q. Max-margin weight learning for medical knowledge network. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 156:179-190. [PMID: 29428070 DOI: 10.1016/j.cmpb.2018.01.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 10/30/2017] [Accepted: 01/10/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND AND OBJECTIVE The application of medical knowledge strongly affects the performance of intelligent diagnosis, and method of learning the weights of medical knowledge plays a substantial role in probabilistic graphical models (PGMs). The purpose of this study is to investigate a discriminative weight-learning method based on a medical knowledge network (MKN). METHODS We propose a training model called the maximum margin medical knowledge network (M3KN), which is strictly derived for calculating the weight of medical knowledge. Using the definition of a reasonable margin, the weight learning can be transformed into a margin optimization problem. To solve the optimization problem, we adopt a sequential minimal optimization (SMO) algorithm and the clique property of a Markov network. Ultimately, M3KN not only incorporates the inference ability of PGMs but also deals with high-dimensional logic knowledge. RESULTS The experimental results indicate that M3KN obtains a higher F-measure score than the maximum likelihood learning algorithm of MKN for both Chinese Electronic Medical Records (CEMRs) and Blood Examination Records (BERs). Furthermore, the proposed approach is obviously superior to some classical machine learning algorithms for medical diagnosis. To adequately manifest the importance of domain knowledge, we numerically verify that the diagnostic accuracy of M3KN is gradually improved as the number of learned CEMRs increase, which contain important medical knowledge. CONCLUSIONS Our experimental results show that the proposed method performs reliably for learning the weights of medical knowledge. M3KN outperforms other existing methods by achieving an F-measure of 0.731 for CEMRs and 0.4538 for BERs. This further illustrates that M3KN can facilitate the investigations of intelligent healthcare.
Collapse
Affiliation(s)
- Jingchi Jiang
- School of Computer Science and Technology, Harbin Institute of Technology, Comprehensive Building 803 Harbin 150001, China.
| | - Jing Xie
- School of Computer Science and Technology, Harbin Institute of Technology, Comprehensive Building 803 Harbin 150001, China
| | - Chao Zhao
- School of Computer Science and Technology, Harbin Institute of Technology, Comprehensive Building 803 Harbin 150001, China
| | - Jia Su
- School of Computer Science and Technology, Harbin Institute of Technology, Comprehensive Building 803 Harbin 150001, China
| | - Yi Guan
- School of Computer Science and Technology, Harbin Institute of Technology, Comprehensive Building 803 Harbin 150001, China.
| | - Qiubin Yu
- Medical Record Room, The 2nd Affiliated Hospital of Harbin Medical University, Harbin 150086, China
| |
Collapse
|
11
|
Zhao C, Jiang J, Xu Z, Guan Y. A study of EMR-based medical knowledge network and its applications. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 143:13-23. [PMID: 28391811 DOI: 10.1016/j.cmpb.2017.02.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Revised: 01/23/2017] [Accepted: 02/09/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND AND OBJECTIVE Electronic medical records (EMRs) contain an amount of medical knowledge which can be used for clinical decision support. We attempt to integrate this medical knowledge into a complex network, and then implement a diagnosis model based on this network. METHODS The dataset of our study contains 992 records which are uniformly sampled from different departments of the hospital. In order to integrate the knowledge of these records, an EMR-based medical knowledge network (EMKN) is constructed. This network takes medical entities as nodes, and co-occurrence relationships between the two entities as edges. Selected properties of this network are analyzed. To make use of this network, a basic diagnosis model is implemented. Seven hundred records are randomly selected to re-construct the network, and the remaining 292 records are used as test records. The vector space model is applied to illustrate the relationships between diseases and symptoms. Because there may exist more than one actual disease in a record, the recall rate of the first ten results, and the average precision are adopted as evaluation measures. RESULTS Compared with a random network of the same size, this network has a similar average length but a much higher clustering coefficient. Additionally, it can be observed that there are direct correlations between the community structure and the real department classes in the hospital. For the diagnosis model, the vector space model using disease as a base obtains the best result. At least one accurate disease can be obtained in 73.27% of the records in the first ten results. CONCLUSION We constructed an EMR-based medical knowledge network by extracting the medical entities. This network has the small-world and scale-free properties. Moreover, the community structure showed that entities in the same department have a tendency to be self-aggregated. Based on this network, a diagnosis model was proposed. This model uses only the symptoms as inputs and is not restricted to a specific disease. The experiments conducted demonstrated that EMKN is a simple and universal technique to integrate different medical knowledge from EMRs, and can be used for clinical decision support.
Collapse
Affiliation(s)
- Chao Zhao
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
| | - Jingchi Jiang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
| | - Zhiming Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
| | - Yi Guan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
| |
Collapse
|
12
|
López-Fernández H, Reboiro-Jato M, Pérez Rodríguez JA, Fdez-Riverola F, Glez-Peña D. The Artificial Intelligence Workbench: a retrospective review. ADCAIJ: ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL 2016; 5:73-85. [DOI: 10.14201/adcaij2016517385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
Abstract
Last decade, biomedical and bioinformatics researchers have been demanding advanced and user-friendly applications for real use in practice. In this context, the Artificial Intelligence Workbench, an open-source Java desktop application framework for scientific software development, emerged with the goal of provid-ing support to both fundamental and applied research in the domain of transla-tional biomedicine and bioinformatics. AIBench automatically provides function-alities that are common to scientific applications, such as user parameter defini-tion, logging facilities, multi-threading execution, experiment repeatability, work-flow management, and fast user interface development, among others. Moreover, AIBench promotes a reusable component based architecture, which also allows assembling new applications by the reuse of libraries from existing projects or third-party software. Ten years have passed since the first release of AIBench, so it is time to look back and check if it has fulfilled the purposes for which it was conceived to and how it evolved over time.
Collapse
|
13
|
Keretna S, Lim CP, Creighton D, Shaban KB. Enhancing medical named entity recognition with an extended segment representation technique. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 119:88-100. [PMID: 25791277 DOI: 10.1016/j.cmpb.2015.02.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Revised: 02/18/2015] [Accepted: 02/24/2015] [Indexed: 06/04/2023]
Abstract
OBJECTIVE The objective of this paper is to formulate an extended segment representation (SR) technique to enhance named entity recognition (NER) in medical applications. METHODS An extension to the IOBES (Inside/Outside/Begin/End/Single) SR technique is formulated. In the proposed extension, a new class is assigned to words that do not belong to a named entity (NE) in one context but appear as an NE in other contexts. Ambiguity in such cases can negatively affect the results of classification-based NER techniques. Assigning a separate class to words that can potentially cause ambiguity in NER allows a classifier to detect NEs more accurately; therefore increasing classification accuracy. RESULTS The proposed SR technique is evaluated using the i2b2 2010 medical challenge data set with eight different classifiers. Each classifier is trained separately to extract three different medical NEs, namely treatment, problem, and test. From the three experimental results, the extended SR technique is able to improve the average F1-measure results pertaining to seven out of eight classifiers. The kNN classifier shows an average reduction of 0.18% across three experiments, while the C4.5 classifier records an average improvement of 9.33%.
Collapse
Affiliation(s)
- Sara Keretna
- Centre for Intelligent Systems Research, Deakin University, Australia.
| | - Chee Peng Lim
- Centre for Intelligent Systems Research, Deakin University, Australia.
| | - Doug Creighton
- Centre for Intelligent Systems Research, Deakin University, Australia.
| | - Khaled Bashir Shaban
- Computer Science and Engineering Department, College of Engineering, Qatar University, Qatar.
| |
Collapse
|
14
|
Pérez-Pérez M, Glez-Peña D, Fdez-Riverola F, Lourenço A. Marky: a tool supporting annotation consistency in multi-user and iterative document annotation projects. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 118:242-251. [PMID: 25480679 DOI: 10.1016/j.cmpb.2014.11.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Revised: 10/24/2014] [Accepted: 11/18/2014] [Indexed: 06/04/2023]
Abstract
BACKGROUND AND OBJECTIVES Document annotation is a key task in the development of Text Mining methods and applications. High quality annotated corpora are invaluable, but their preparation requires a considerable amount of resources and time. Although the existing annotation tools offer good user interaction interfaces to domain experts, project management and quality control abilities are still limited. Therefore, the current work introduces Marky, a new Web-based document annotation tool equipped to manage multi-user and iterative projects, and to evaluate annotation quality throughout the project life cycle. METHODS At the core, Marky is a Web application based on the open source CakePHP framework. User interface relies on HTML5 and CSS3 technologies. Rangy library assists in browser-independent implementation of common DOM range and selection tasks, and Ajax and JQuery technologies are used to enhance user-system interaction. RESULTS Marky grants solid management of inter- and intra-annotator work. Most notably, its annotation tracking system supports systematic and on-demand agreement analysis and annotation amendment. Each annotator may work over documents as usual, but all the annotations made are saved by the tracking system and may be further compared. So, the project administrator is able to evaluate annotation consistency among annotators and across rounds of annotation, while annotators are able to reject or amend subsets of annotations made in previous rounds. As a side effect, the tracking system minimises resource and time consumption. CONCLUSIONS Marky is a novel environment for managing multi-user and iterative document annotation projects. Compared to other tools, Marky offers a similar visually intuitive annotation experience while providing unique means to minimise annotation effort and enforce annotation quality, and therefore corpus consistency. Marky is freely available for non-commercial use at http://sing.ei.uvigo.es/marky.
Collapse
Affiliation(s)
- Martín Pérez-Pérez
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain(1)
| | - Daniel Glez-Peña
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain(1)
| | - Florentino Fdez-Riverola
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain(1)
| | - Anália Lourenço
- ESEI - Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, Universidad de Vigo, 32004 Ourense, Spain(1); Centre of Biological Engineering, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal.
| |
Collapse
|
15
|
Martínez-Romero M, Vázquez-Naya JM, Pereira J, Pazos A. BiOSS: A system for biomedical ontology selection. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 114:125-140. [PMID: 24573129 DOI: 10.1016/j.cmpb.2014.01.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2013] [Revised: 01/23/2014] [Accepted: 01/28/2014] [Indexed: 06/03/2023]
Abstract
In biomedical informatics, ontologies are considered a key technology for annotating, retrieving and sharing the huge volume of publicly available data. Due to the increasing amount, complexity and variety of existing biomedical ontologies, choosing the ones to be used in a semantic annotation problem or to design a specific application is a difficult task. As a consequence, the design of approaches and tools addressed to facilitate the selection of biomedical ontologies is becoming a priority. In this paper we present BiOSS, a novel system for the selection of biomedical ontologies. BiOSS evaluates the adequacy of an ontology to a given domain according to three different criteria: (1) the extent to which the ontology covers the domain; (2) the semantic richness of the ontology in the domain; (3) the popularity of the ontology in the biomedical community. BiOSS has been applied to 5 representative problems of ontology selection. It also has been compared to existing methods and tools. Results are promising and show the usefulness of BiOSS to solve real-world ontology selection problems. BiOSS is openly available both as a web tool and a web service.
Collapse
Affiliation(s)
| | - José M Vázquez-Naya
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, 15071 A Coruña, Spain.
| | - Javier Pereira
- IMEDIR Center, University of A Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain.
| | - Alejandro Pazos
- Department of Information and Communication Technologies, Computer Science Faculty, University of A Coruña, 15071 A Coruña, Spain.
| |
Collapse
|