1
|
Preiss J. Avoiding background knowledge: literature based discovery from important information. BMC Bioinformatics 2023; 23:570. [PMID: 36918777 PMCID: PMC10013236 DOI: 10.1186/s12859-022-04892-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 08/16/2022] [Indexed: 03/16/2023] Open
Abstract
BACKGROUND Automatic literature based discovery attempts to uncover new knowledge by connecting existing facts: information extracted from existing publications in the form of [Formula: see text] and [Formula: see text] relations can be simply connected to deduce [Formula: see text]. However, using this approach, the quantity of proposed connections is often too vast to be useful. It can be reduced by using subject[Formula: see text](predicate)[Formula: see text]object triples as the [Formula: see text] relations, but too many proposed connections remain for manual verification. RESULTS Based on the hypothesis that only a small number of subject-predicate-object triples extracted from a publication represent the paper's novel contribution(s), we explore using BERT embeddings to identify these before literature based discovery is performed utilizing only these, important, triples. While the method exploits the availability of full texts of publications in the CORD-19 dataset-making use of the fact that a novel contribution is likely to be mentioned in both an abstract and the body of a paper-to build a training set, the resulting tool can be applied to papers with only abstracts available. Candidate hidden knowledge pairs generated from unfiltered triples and those built from important triples only are compared using a variety of timeslicing gold standards. CONCLUSIONS The quantity of proposed knowledge pairs is reduced by a factor of [Formula: see text], and we show that when the gold standard is designed to avoid rewarding background knowledge, the precision obtained increases up to a factor of 10. We argue that the gold standard needs to be carefully considered, and release as yet undiscovered candidate knowledge pairs based on important triples alongside this work.
Collapse
Affiliation(s)
- Judita Preiss
- Information School, University of Sheffield, S1 4DP, Sheffield, UK.
| |
Collapse
|
2
|
Kukafka R, Zhou J, Ji M, Pei L, Wang Z. Development and Evaluation of Health Recommender Systems: Systematic Scoping Review and Evidence Mapping. J Med Internet Res 2023; 25:e38184. [PMID: 36656630 PMCID: PMC9896351 DOI: 10.2196/38184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 09/12/2022] [Accepted: 11/01/2022] [Indexed: 11/07/2022] Open
Abstract
BACKGROUND Health recommender systems (HRSs) are information retrieval systems that provide users with relevant items according to the users' needs, which can motivate and engage users to change their behavior. OBJECTIVE This study aimed to identify the development and evaluation of HRSs and create an evidence map. METHODS A total of 6 databases were searched to identify HRSs reported in studies from inception up to June 30, 2022, followed by forward citation and grey literature searches. Titles, abstracts, and full texts were screened independently by 2 reviewers, with discrepancies resolved by a third reviewer, when necessary. Data extraction was performed by one reviewer and checked by a second reviewer. This review was conducted in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) statement. RESULTS A total of 51 studies were included for data extraction. Recommender systems were used across different health domains, such as general health promotion, lifestyle, and generic health service. A total of 23 studies had reported the use of a combination of recommender techniques, classified as hybrid recommender systems, which are the most commonly used recommender techniques in HRSs. In the HRS design stage, only 10 of 51 (19.6%) recommender systems considered personal preferences of end users in the design or development of the system; a total of 29 studies reported the user interface of HRSs, and most HRSs worked on users' mobile interfaces, usually a mobile app. Two categories of HRS evaluations were used, and evaluations of HRSs varied greatly; 62.7% (32/51) of the studies used the offline evaluations using computational methods (no user), and 33.3% (17/51) of the studies included end users in their HRS evaluation. CONCLUSIONS Through this scoping review, nonmedical professionals and policy makers can visualize and better understand HRSs for future studies. The health care professionals and the end users should be encouraged to participate in the future design and development of HRSs to optimize their utility and successful implementation. Detailed evaluations of HRSs in a user-centered approach are needed in future studies.
Collapse
Affiliation(s)
| | - Jia Zhou
- School of Nursing, Peking University, Beijng, China
| | - Mengmeng Ji
- School of Nursing, Peking University, Beijng, China
| | - Lusi Pei
- Wuhan Design and Engineering College, Wuhan, China
| | - Zhiwen Wang
- School of Nursing, Peking University, Beijng, China
| |
Collapse
|
3
|
NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks. Commun Biol 2022; 5:1282. [PMID: 36418514 PMCID: PMC9684490 DOI: 10.1038/s42003-022-04226-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 11/04/2022] [Indexed: 11/25/2022] Open
Abstract
The inference of Gene Regulatory Networks (GRNs) is one of the key challenges in systems biology. Leading algorithms utilize, in addition to gene expression, prior knowledge such as Transcription Factor (TF) DNA binding motifs or results of TF binding experiments. However, such prior knowledge is typically incomplete, therefore, integrating it with gene expression to infer GRNs remains difficult. To address this challenge, we introduce NetREX-CF-Regulatory Network Reconstruction using EXpression and Collaborative Filtering-a GRN reconstruction approach that brings together Collaborative Filtering to address the incompleteness of the prior knowledge and a biologically justified model of gene expression (sparse Network Component Analysis based model). We validated the NetREX-CF using Yeast data and then used it to construct the GRN for Drosophila Schneider 2 (S2) cells. To corroborate the GRN, we performed a large-scale RNA-Seq analysis followed by a high-throughput RNAi treatment against all 465 expressed TFs in the cell line. Our knockdown result has not only extensively validated the GRN we built, but also provides a benchmark that our community can use for evaluating GRNs. Finally, we demonstrate that NetREX-CF can infer GRNs using single-cell RNA-Seq, and outperforms other methods, by using previously published human data.
Collapse
|
4
|
Cheerkoot-Jalim S, Khedo KK. Literature-based discovery approaches for evidence-based healthcare: a systematic review. HEALTH AND TECHNOLOGY 2021; 11:1205-1217. [PMID: 34722102 PMCID: PMC8542914 DOI: 10.1007/s12553-021-00605-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 09/28/2021] [Indexed: 12/12/2022]
Abstract
Purpose Literature-Based Discovery (LBD) is a text mining technique used to generate novel hypotheses from vast amounts of literature sources, by identifying links between concepts from disparate sources. One of the main areas where it has been predominantly applied is the healthcare domain, whereby promising results, in the form of novel hypotheses, have been reported. The purpose of this work was to conduct a systematic literature review of recent publications on LBD in the healthcare domain in order to assess the trends in the approaches used and to identify issues and challenges for such systems. Methods The review was conducted following the principles of the Kitchenham method. The selected studies have been scrutinized and the derived findings have been reported following the PRISMA guidelines. Results The review results reveal useful information regarding the application areas, the data sources considered, the approaches used, the performance in terms of accuracy and reliability and future research challenges. The results of this review will be beneficial to LBD researchers and other stakeholders in the healthcare domain, by providing them with useful insights on the approaches to adopt, data sources to consider, evaluation model to use and challenges to reflect on. Conclusion The synthesis of the results of this work has shed light on recent issues and challenges that drive new LBD models and provides avenues for their application in other diverse areas in the healthcare domain. To the best of our knowledge, no such recent review has been conducted.
Collapse
Affiliation(s)
- Sudha Cheerkoot-Jalim
- Department of Information and Communication Technologies, University of Mauritius, Reduit, Mauritius
| | - Kavi Kumar Khedo
- Department of Digital Technologies, University of Mauritius, Reduit, Mauritius
| |
Collapse
|
5
|
|
6
|
Porter AL, Zhang Y, Huang Y, Wu M. Tracking and Mining the COVID-19 Research Literature. Front Res Metr Anal 2020; 5:594060. [PMID: 33870056 PMCID: PMC8025982 DOI: 10.3389/frma.2020.594060] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 09/28/2020] [Indexed: 12/21/2022] Open
Abstract
The unprecedented, explosive growth of the COVID-19 domain presents challenges to researchers to keep up with research knowledge within the domain. This article profiles this research to help make that knowledge more accessible via overviews and novel categorizations. We provide websites offering means for researchers to probe more deeply to address specific questions. We further probe and reassemble COVID-19 topical content to address research issues concerning topical evolution and emphases on tactical vs. strategic approaches to mitigate this pandemic and reduce future viral threats. Data suggest that heightened attention to strategic, immunological factors is warranted. Connecting with and transferring in research knowledge from outside the COVID-19 domain demand a viable COVID-19 knowledge model. This study provides complementary topical categorizations to facilitate such modeling to inform future Literature-Based Discovery endeavors.
Collapse
Affiliation(s)
- Alan L Porter
- Search Technology, Inc., Norcross, GA, United States.,Science, Technology & Innovation Policy, Georgia Tech, Atlanta, GA, United States
| | - Yi Zhang
- Faculty of Engineering and Information Technology, Australian Artificial Intelligence Institute, University of Technology Sydney, Ultimo, NSW, Australia
| | - Ying Huang
- Department of Management, Strategy and Innovation (MSI), Center for R&D Monitoring (ECOOM), KU Leuven, Leuven, Belgium.,School of Information Management, Wuhan University, Wuhan, China
| | - Mengjia Wu
- Faculty of Engineering and Information Technology, Australian Artificial Intelligence Institute, University of Technology Sydney, Ultimo, NSW, Australia
| |
Collapse
|
7
|
Lin L, Liu J, Lv Y, Guo F. A similarity model based on reinforcement local maximum connected same destination structure oriented to disordered fusion of knowledge graphs. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01673-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
8
|
Use Chou’s 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting. Mol Genet Genomics 2020; 295:1431-1442. [DOI: 10.1007/s00438-020-01711-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 07/11/2020] [Indexed: 01/08/2023]
|
9
|
Wei W, Liu C. Prognostic and predictive roles of microRNA‑411 and its target STK17A in evaluating radiotherapy efficacy and their effects on cell migration and invasion via the p53 signaling pathway in cervical cancer. Mol Med Rep 2019; 21:267-281. [PMID: 31746360 PMCID: PMC6896360 DOI: 10.3892/mmr.2019.10826] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 10/10/2018] [Indexed: 02/06/2023] Open
Abstract
Cervical cancer is one of the most common gynecological malignancies worldwide. However, the pathogenesis of cervical cancer remains to be fully elucidated. Increasing evidence shows that microRNAs (miRNAs) may be involved in the pathogenesis of cervical cancer. The present study tested the hypothesis that the overexpression of miRNA (miR)-411 may delay, whereas the overexpression of serine/threonine kinase 17a (STK17A) may contribute to, cervical cancer development and progression through the p53 pathway. Cervical cancer tissues and adjacent normal tissues were obtained from 141 patients with cervical cancer following radiotherapy, with efficacy evaluated. The receiver operating characteristic curve was plotted to show the value of miR-411 and STK17A in predicting the efficacy of radiotherapy. Cox's proportional hazards regression model was utilized for multivariate analysis. A series of inhibitors, mimics or small interfering RNAs against STK17A were introduced to validate the regulatory mechanism of miR-411 in governing STK17A, determined with a luciferase reporter gene assay. The expression of miR-411 and STK17A, and the status of the p53 signaling pathway were evaluated. The colony forming ability, proliferation, migration, invasion and apoptosis of CaSki cells were assessed using a colony formation assay, 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide assay, Transwell assay and flow cytometry, respectively. miR-411 was upregulated but STK17A was reciprocal in cervical tissues. The overexpression of miR-411 and low expression of STK17A were correlated with high efficacy of radiotherapy. miR-411 and STK17A had predictive value for the efficacy of radiotherapy; miR-411 was the protective factor and STK17A was a risk factor for prognosis of cervical cancer. Increasing miR-411 activated the p53 signaling pathway and promoted cell apoptosis, but inhibited cell proliferation, invasion and migration. STK17A, an miR-411 target, increased following miR-411 over-expression, whereas the p53 signaling pathway was activated following STK17A inhibition. It was observed that the effect of miR-411 inhibition was lost following STK17A silencing. These findings indicate that the miR-411-mediated direct suppression of STK17A induces apoptosis and suppresses the proliferation, migration and invasion of human cervical cancer cells via the p53 signaling pathway. Additionally, miR-411 and STK17A have predictive value for the efficacy of radiotherapy.
Collapse
Affiliation(s)
- Wei Wei
- Department of Clinical Laboratory, Jining No. 1 People's Hospital, Jining, Shandong 272011, P.R. China
| | - Cun Liu
- Department of Clinical Laboratory, Jining No. 1 People's Hospital, Jining, Shandong 272011, P.R. China
| |
Collapse
|
10
|
Timilsina M, Yang H, Sahay R, Rebholz-Schuhmann D. Predicting links between tumor samples and genes using 2-Layered graph based diffusion approach. BMC Bioinformatics 2019; 20:462. [PMID: 31500564 PMCID: PMC6734347 DOI: 10.1186/s12859-019-3056-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 08/26/2019] [Indexed: 12/21/2022] Open
Abstract
Background Determining the association between tumor sample and the gene is demanding because it requires a high cost for conducting genetic experiments. Thus, the discovered association between tumor sample and gene further requires clinical verification and validation. This entire mechanism is time-consuming and expensive. Due to this issue, predicting the association between tumor samples and genes remain a challenge in biomedicine. Results Here we present, a computational model based on a heat diffusion algorithm which can predict the association between tumor samples and genes. We proposed a 2-layered graph. In the first layer, we constructed a graph of tumor samples and genes where these two types of nodes are connected by “hasGene” relationship. In the second layer, the gene nodes are connected by “interaction” relationship. We applied the heat diffusion algorithms in nine different variants of genetic interaction networks extracted from STRING and BioGRID database. The heat diffusion algorithm predicted the links between tumor samples and genes with mean AUC-ROC score of 0.84. This score is obtained by using weighted genetic interactions of fusion or co-occurrence channels from the STRING database. For the unweighted genetic interaction from the BioGRID database, the algorithms predict the links with an AUC-ROC score of 0.74. Conclusions We demonstrate that the gene-gene interaction scores could improve the predictive power of the heat diffusion model to predict the links between tumor samples and genes. We showed the efficient runtime of the heat diffusion algorithm in various genetic interaction network. We statistically validated our prediction quality of the links between tumor samples and genes. Electronic supplementary material The online version of this article (10.1186/s12859-019-3056-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mohan Timilsina
- Insight Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland.
| | - Haixuan Yang
- School of Mathematics Statistics and Applied Mathematics, National University of Ireland Galway, Galway, Ireland
| | - Ratnesh Sahay
- Insight Centre for Data Analytics, National University of Ireland Galway, Galway, Ireland
| | | |
Collapse
|
11
|
Text Filtering through Multi-Pattern Matching: A Case Study of Wu–Manber–Uy on the Language of Uyghur. INFORMATION 2019. [DOI: 10.3390/info10080246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Given its generality in applications and its high time-efficiency on big data-sets, in recent years, the technique of text filtering through pattern matching has been attracting increasing attention from the field of information retrieval and Natural language Processing (NLP) research communities at large. That being the case, however, it has yet to be seen how this technique and its algorithms, (e.g., Wu–Manber, which is also considered in this paper) can be applied and adopted properly and effectively to Uyghur, a low-resource language that is mostly spoken by the ethnic Uyghur group with a population of more than eleven-million in Xinjiang, China. We observe that technically, the challenge is mainly caused by two factors: (1) Vowel weakening and (2) mismatching in semantics between affixes and stems. Accordingly, in this paper, we propose Wu–Manber–Uy, a variant of an improvement to Wu–Manber, dedicated particularly for working on the Uyghur language. Wu–Manber–Uy implements a stem deformation-based pattern expansion strategy, specifically for reducing the mismatching of patterns caused by vowel weakening and spelling errors. A two-way strategy that applies invigilation and control on the change of lexical meaning of stems during word-building is also used in Wu–Manber–Uy. Extra consideration with respect to Word2vec and the dictionary are incorporated into the system for processing Uyghur. The experimental results we have obtained consistently demonstrate the high performance of Wu–Manber–Uy.
Collapse
|
12
|
Thilakaratne M, Falkner K, Atapattu T. A systematic review on literature-based discovery workflow. PeerJ Comput Sci 2019; 5:e235. [PMID: 33816888 PMCID: PMC7924697 DOI: 10.7717/peerj-cs.235] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 10/17/2019] [Indexed: 05/02/2023]
Abstract
As scientific publication rates increase, knowledge acquisition and the research development process have become more complex and time-consuming. Literature-Based Discovery (LBD), supporting automated knowledge discovery, helps facilitate this process by eliciting novel knowledge by analysing existing scientific literature. This systematic review provides a comprehensive overview of the LBD workflow by answering nine research questions related to the major components of the LBD workflow (i.e., input, process, output, and evaluation). With regards to the input component, we discuss the data types and data sources used in the literature. The process component presents filtering techniques, ranking/thresholding techniques, domains, generalisability levels, and resources. Subsequently, the output component focuses on the visualisation techniques used in LBD discipline. As for the evaluation component, we outline the evaluation techniques, their generalisability, and the quantitative measures used to validate results. To conclude, we summarise the findings of the review for each component by highlighting the possible future research directions.
Collapse
Affiliation(s)
- Menasha Thilakaratne
- Faculty of Engineering, Computer and Mathematical Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Katrina Falkner
- Faculty of Engineering, Computer and Mathematical Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Thushari Atapattu
- Faculty of Engineering, Computer and Mathematical Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| |
Collapse
|
13
|
Mining latent information in PTSD psychometrics with fuzziness for effective diagnoses. Sci Rep 2018; 8:16266. [PMID: 30389985 PMCID: PMC6214927 DOI: 10.1038/s41598-018-34573-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 10/19/2018] [Indexed: 02/05/2023] Open
Abstract
The options of traditional self-report rating-scale, like the PTSD Checklist Civilian (PCL-C) scale, have no clear boundaries which might cause considerable biases and low effectiveness. This research aimed to explore the feasibility of using fuzzy set in the data processing to promote the screening effectiveness of PCL-C in real-life practical settings. The sensitivity, specificity, Youden's index etc., of PCL-C at different cutoff lines (38, 44 and 50 respectively) were analyzed and compared with those of fuzzy set approach processing. In practice, no matter the cutoff line of the PCL-C was set at 50, 44 or 38, the PCL-C showed good specificity, but failed to exhibit good sensitivity and screening effectiveness. The highest sensitivity was at 65.22%, with Youden's index being 0.64. After fuzzy processing, the fuzzy-PCL-C's sensitivity increased to 91.30%, Youden's index rose to 0.91, having seen marked augmentation. In conclusion, this study indicates that fuzzy set can be used in the data processing of psychiatric scales which have no clear definition standard of the options to improve the effectiveness of the scales.
Collapse
|
14
|
Mower J, Subramanian D, Cohen T. Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications. J Am Med Inform Assoc 2018; 25:1339-1350. [PMID: 30010902 PMCID: PMC6454491 DOI: 10.1093/jamia/ocy077] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 04/23/2018] [Accepted: 06/05/2018] [Indexed: 02/01/2023] Open
Abstract
Objective The aim of this work is to leverage relational information extracted from biomedical literature using a novel synthesis of unsupervised pretraining, representational composition, and supervised machine learning for drug safety monitoring. Methods Using ≈80 million concept-relationship-concept triples extracted from the literature using the SemRep Natural Language Processing system, distributed vector representations (embeddings) were generated for concepts as functions of their relationships utilizing two unsupervised representational approaches. Embeddings for drugs and side effects of interest from two widely used reference standards were then composed to generate embeddings of drug/side-effect pairs, which were used as input for supervised machine learning. This methodology was developed and evaluated using cross-validation strategies and compared to contemporary approaches. To qualitatively assess generalization, models trained on the Observational Medical Outcomes Partnership (OMOP) drug/side-effect reference set were evaluated against a list of ≈1100 drugs from an online database. Results The employed method improved performance over previous approaches. Cross-validation results advance the state of the art (AUC 0.96; F1 0.90 and AUC 0.95; F1 0.84 across the two sets), outperforming methods utilizing literature and/or spontaneous reporting system data. Examination of predictions for unseen drug/side-effect pairs indicates the ability of these methods to generalize, with over tenfold label support enrichment in the top 100 predictions versus the bottom 100 predictions. Discussion and Conclusion Our methods can assist the pharmacovigilance process using information from the biomedical literature. Unsupervised pretraining generates a rich relationship-based representational foundation for machine learning techniques to classify drugs in the context of a putative side effect, given known examples.
Collapse
Affiliation(s)
- Justin Mower
- Baylor College of Medicine, Quantitative and Computational Biosciences, Houston, Texas, USA
| | | | - Trevor Cohen
- School of Biomedical Informatics, University of Texas Health Science Center Houston, Texas, USA
| |
Collapse
|
15
|
Lee JJY, Gottlieb MM, Lever J, Jones SJM, Blau N, van Karnebeek CDM, Wasserman WW. Text-based phenotypic profiles incorporating biochemical phenotypes of inborn errors of metabolism improve phenomics-based diagnosis. J Inherit Metab Dis 2018; 41:555-562. [PMID: 29340838 PMCID: PMC5959948 DOI: 10.1007/s10545-017-0125-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Revised: 12/01/2017] [Accepted: 12/05/2017] [Indexed: 01/28/2023]
Abstract
Phenomics is the comprehensive study of phenotypes at every level of biology: from metabolites to organisms. With high throughput technologies increasing the scope of biological discoveries, the field of phenomics has been developing rapid and precise methods to collect, catalog, and analyze phenotypes. Such methods have allowed phenotypic data to be widely used in medical applications, from assisting clinical diagnoses to prioritizing genomic diagnoses. To channel the benefits of phenomics into the field of inborn errors of metabolism (IEM), we have recently launched IEMbase, an expert-curated knowledgebase of IEM and their disease-characterizing phenotypes. While our efforts with IEMbase have realized benefits, taking full advantage of phenomics requires a comprehensive curation of IEM phenotypes in core phenomics projects, which is dependent upon contributions from the IEM clinical and research community. Here, we assess the inclusion of IEM biochemical phenotypes in a core phenomics project, the Human Phenotype Ontology. We then demonstrate the utility of biochemical phenotypes using a text-based phenomics method to predict gene-disease relationships, showing that the prediction of IEM genes is significantly better using biochemical rather than clinical profiles. The findings herein provide a motivating goal for the IEM community to expand the computationally accessible descriptions of biochemical phenotypes associated with IEM in phenomics resources.
Collapse
Affiliation(s)
- Jessica J Y Lee
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Room 3109, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
| | - Michael M Gottlieb
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Room 3109, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
| | - Jake Lever
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Nenad Blau
- Dietmar-Hopp Metabolic Center, Department of General Pediatrics, University Hospital, Heidelberg, Germany
| | - Clara D M van Karnebeek
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Room 3109, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
- Department of Pediatrics, University of British Columbia, Vancouver, BC, Canada
- Departments of Pediatrics and Clinical Genetics, Emma Children's Hospital, Academic Medical Centre, Amsterdam, The Netherlands
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Room 3109, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|