1
|
Li W, Deng K, Zhang M, Xu Y, Zhang J, Liang Q, Yang Z, Jin L, Hu C, Zhao YT. Network Pharmacology Combined with Experimental Validation to Investigate the Effects and Mechanisms of Aucubin on Aging-Related Muscle Atrophy. Int J Mol Sci 2025; 26:2626. [PMID: 40141269 PMCID: PMC11941843 DOI: 10.3390/ijms26062626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2025] [Revised: 03/10/2025] [Accepted: 03/13/2025] [Indexed: 03/28/2025] Open
Abstract
Aucubin (AU) is one of the main components of the traditional Chinese medicine Eucommia ulmoides Oliv (EU). This study investigated the effects of AU on aging-related skeletal muscle atrophy in vitro and in vivo. The results of network pharmacology revealed the potential therapeutic effects of AU on muscle atrophy. In vitro, AU effectively attenuated D-gal-induced cellular damage, reduced the number of senescence-associated β-galactosidase (SA-β-Gal)-positive cells, down-regulated the expression levels of muscle atrophy-related proteins Atrogin-1 and MuRF1, and improved myotube differentiation, thereby mitigating myotube atrophy. Notably, AU was found to attenuate oxidative stress and apoptosis in skeletal muscle cells by reducing ROS production, regulating Cleaved caspase3 and BAX/Bcl-2 expression in apoptotic pathways, and enhancing Sirt1 and PGC-1α signaling pathways. In vivo studies demonstrated that AU treatment extended the average lifespan of Caenorhabditis elegans (C. elegans), increased locomotor activity, improved body wall muscle mitochondrial content, and alleviated oxidative damage in C. elegans. These findings suggested that AU can ameliorate aging-related muscle atrophy and show significant potential in preventing and treating muscle atrophy.
Collapse
Affiliation(s)
- Wenan Li
- Guangdong Province Engineering Laboratory for Marine Biological Products, Guangdong Provincial Key Laboratory of Aquatic Product Processing and Safety, College of Food Science and Technology, Modern Biochemistry Experimental Center, Zhanjiang Municipal Key Laboratory of Marine Drugs and Nutrition for Brain Health, Guangdong Ocean University, Zhanjiang 524088, China (K.D.); (M.Z.); (Y.X.); (J.Z.); (Q.L.); (Z.Y.)
| | - Kaishu Deng
- Guangdong Province Engineering Laboratory for Marine Biological Products, Guangdong Provincial Key Laboratory of Aquatic Product Processing and Safety, College of Food Science and Technology, Modern Biochemistry Experimental Center, Zhanjiang Municipal Key Laboratory of Marine Drugs and Nutrition for Brain Health, Guangdong Ocean University, Zhanjiang 524088, China (K.D.); (M.Z.); (Y.X.); (J.Z.); (Q.L.); (Z.Y.)
| | - Mengyue Zhang
- Guangdong Province Engineering Laboratory for Marine Biological Products, Guangdong Provincial Key Laboratory of Aquatic Product Processing and Safety, College of Food Science and Technology, Modern Biochemistry Experimental Center, Zhanjiang Municipal Key Laboratory of Marine Drugs and Nutrition for Brain Health, Guangdong Ocean University, Zhanjiang 524088, China (K.D.); (M.Z.); (Y.X.); (J.Z.); (Q.L.); (Z.Y.)
| | - Yan Xu
- Guangdong Province Engineering Laboratory for Marine Biological Products, Guangdong Provincial Key Laboratory of Aquatic Product Processing and Safety, College of Food Science and Technology, Modern Biochemistry Experimental Center, Zhanjiang Municipal Key Laboratory of Marine Drugs and Nutrition for Brain Health, Guangdong Ocean University, Zhanjiang 524088, China (K.D.); (M.Z.); (Y.X.); (J.Z.); (Q.L.); (Z.Y.)
| | - Jingxi Zhang
- Guangdong Province Engineering Laboratory for Marine Biological Products, Guangdong Provincial Key Laboratory of Aquatic Product Processing and Safety, College of Food Science and Technology, Modern Biochemistry Experimental Center, Zhanjiang Municipal Key Laboratory of Marine Drugs and Nutrition for Brain Health, Guangdong Ocean University, Zhanjiang 524088, China (K.D.); (M.Z.); (Y.X.); (J.Z.); (Q.L.); (Z.Y.)
| | - Qingsheng Liang
- Guangdong Province Engineering Laboratory for Marine Biological Products, Guangdong Provincial Key Laboratory of Aquatic Product Processing and Safety, College of Food Science and Technology, Modern Biochemistry Experimental Center, Zhanjiang Municipal Key Laboratory of Marine Drugs and Nutrition for Brain Health, Guangdong Ocean University, Zhanjiang 524088, China (K.D.); (M.Z.); (Y.X.); (J.Z.); (Q.L.); (Z.Y.)
| | - Zhiyou Yang
- Guangdong Province Engineering Laboratory for Marine Biological Products, Guangdong Provincial Key Laboratory of Aquatic Product Processing and Safety, College of Food Science and Technology, Modern Biochemistry Experimental Center, Zhanjiang Municipal Key Laboratory of Marine Drugs and Nutrition for Brain Health, Guangdong Ocean University, Zhanjiang 524088, China (K.D.); (M.Z.); (Y.X.); (J.Z.); (Q.L.); (Z.Y.)
| | - Leigang Jin
- State Key Laboratory of Pharmaceutical Biotechnology, Department of Medicine, The University of Hong Kong, Hong Kong SAR, China;
| | - Chuanyin Hu
- Department of Biology, Guangdong Medical University, Zhanjiang 524023, China
| | - Yun-Tao Zhao
- Guangdong Province Engineering Laboratory for Marine Biological Products, Guangdong Provincial Key Laboratory of Aquatic Product Processing and Safety, College of Food Science and Technology, Modern Biochemistry Experimental Center, Zhanjiang Municipal Key Laboratory of Marine Drugs and Nutrition for Brain Health, Guangdong Ocean University, Zhanjiang 524088, China (K.D.); (M.Z.); (Y.X.); (J.Z.); (Q.L.); (Z.Y.)
| |
Collapse
|
2
|
Liu X, Gao L, Peng Y, Fang Z, Wang J. PheSom: a term frequency-based method for measuring human phenotype similarity on the basis of MeSH vocabulary. Front Genet 2023; 14:1185790. [PMID: 37496714 PMCID: PMC10366691 DOI: 10.3389/fgene.2023.1185790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 06/21/2023] [Indexed: 07/28/2023] Open
Abstract
Background: Phenotype similarity calculation should be used to help improve drug repurposing. In this study, based on the MeSH terms describing the phenotypes deposited in OMIM, we proposed a method, namely, PheSom (Phenotype Similarity On MeSH), to measure the similarity between phenotypes. PheSom counted the number of overlapping MeSH terms between two phenotypes and then took the weight of every MeSH term within each phenotype into account according to the term frequency-inverse document frequency (FIDC). Phenotype-related genes were used for the evaluation of our method. Results: A 7,739 × 7,739 similarity score matrix was finally obtained and the number of phenotype pairs was dramatically decreased with the increase of similarity score. Besides, the overlapping rates of phenotype-related genes were remarkably increased with the increase of similarity score between phenotypes, which supports the reliability of our method. Conclusion: We anticipate our method can be applied to identifying novel therapeutic methods for complex diseases.
Collapse
Affiliation(s)
- Xinhua Liu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| | - Ling Gao
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
| | - Yonglin Peng
- Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Zhonghai Fang
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| | - Ju Wang
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| |
Collapse
|
3
|
Binkheder S, Wu HY, Quinney SK, Zhang S, Zitu MM, Chiang CW, Wang L, Jones J, Li L. PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature. J Biomed Semantics 2022; 13:17. [PMID: 35690873 PMCID: PMC9188713 DOI: 10.1186/s13326-022-00272-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/18/2022] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Adverse events induced by drug-drug interactions are a major concern in the United States. Current research is moving toward using electronic health record (EHR) data, including for adverse drug events discovery. One of the first steps in EHR-based studies is to define a phenotype for establishing a cohort of patients. However, phenotype definitions are not readily available for all phenotypes. One of the first steps of developing automated text mining tools is building a corpus. Therefore, this study aimed to develop annotation guidelines and a gold standard corpus to facilitate building future automated approaches for mining phenotype definitions contained in the literature. Furthermore, our aim is to improve the understanding of how these published phenotype definitions are presented in the literature and how we annotate them for future text mining tasks. RESULTS Two annotators manually annotated the corpus on a sentence-level for the presence of evidence for phenotype definitions. Three major categories (inclusion, intermediate, and exclusion) with a total of ten dimensions were proposed characterizing major contextual patterns and cues for presenting phenotype definitions in published literature. The developed annotation guidelines were used to annotate the corpus that contained 3971 sentences: 1923 out of 3971 (48.4%) for the inclusion category, 1851 out of 3971 (46.6%) for the intermediate category, and 2273 out of 3971 (57.2%) for exclusion category. The highest number of annotated sentences was 1449 out of 3971 (36.5%) for the "Biomedical & Procedure" dimension. The lowest number of annotated sentences was 49 out of 3971 (1.2%) for "The use of NLP". The overall percent inter-annotator agreement was 97.8%. Percent and Kappa statistics also showed high inter-annotator agreement across all dimensions. CONCLUSIONS The corpus and annotation guidelines can serve as a foundational informatics approach for annotating and mining phenotype definitions in literature, and can be used later for text mining applications.
Collapse
Affiliation(s)
- Samar Binkheder
- Department of Biohealth Informatics, Indiana University School of Informatics and Computing, Indianapolis, IN, USA
- Medical Informatics Unit, Department of Medical Education, College of Medicine, King Saud University, Riyadh, Saudi Arabia
| | - Heng-Yi Wu
- Development Science Informatics, Genentech, South San Francisco, CA, USA
| | - Sara K Quinney
- Department of Obstetrics and Gynecology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Shijun Zhang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Md Muntasir Zitu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Chien-Wei Chiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Lei Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Josette Jones
- Department of Biohealth Informatics, Indiana University School of Informatics and Computing, Indianapolis, IN, USA
| | - Lang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA.
- , 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH, 43210, USA.
| |
Collapse
|
4
|
Yates T, Lain A, Campbell J, FitzPatrick DR, Simpson TI. Creation and evaluation of full-text literature-derived, feature-weighted disease models of genetically determined developmental disorders. Database (Oxford) 2022; 2022:baac038. [PMID: 35670729 PMCID: PMC9216525 DOI: 10.1093/database/baac038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/26/2022] [Accepted: 05/25/2022] [Indexed: 11/24/2022]
Abstract
There are >2500 different genetically determined developmental disorders (DD), which, as a group, show very high levels of both locus and allelic heterogeneity. This has led to the wide-spread use of evidence-based filtering of genome-wide sequence data as a diagnostic tool in DD. Determining whether the association of a filtered variant at a specific locus is a plausible explanation of the phenotype in the proband is crucial and commonly requires extensive manual literature review by both clinical scientists and clinicians. Access to a database of weighted clinical features extracted from rigorously curated literature would increase the efficiency of this process and facilitate the development of robust phenotypic similarity metrics. However, given the large and rapidly increasing volume of published information, conventional biocuration approaches are becoming impractical. Here, we present a scalable, automated method for the extraction of categorical phenotypic descriptors from the full-text literature. Papers identified through literature review were downloaded and parsed using the Cadmus custom retrieval package. Human Phenotype Ontology terms were extracted using MetaMap, with 76-84% precision and 65-73% recall. Mean terms per paper increased from 9 in title + abstract, to 68 using full text. We demonstrate that these literature-derived disease models plausibly reflect true disease expressivity more accurately than widely used manually curated models, through comparison with prospectively gathered data from the Deciphering Developmental Disorders study. The area under the curve for receiver operating characteristic (ROC) curves increased by 5-10% through the use of literature-derived models. This work shows that scalable automated literature curation increases performance and adds weight to the need for this strategy to be integrated into informatic variant analysis pipelines. Database URL: https://doi.org/10.1093/database/baac038.
Collapse
Affiliation(s)
- T.M Yates
- MRC Human Genetics Unit, Western General Hospital, Institute of Genetics and Cancer, The University of Edinburgh, Crewe Road South, Edinburgh EH4 2XU, UK
- Transforming Genetic Medicine Initiative, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - A Lain
- Institute for Adaptive and Neural Computation, Informatics Forum, The University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK
| | - J Campbell
- MRC Human Genetics Unit, Western General Hospital, Institute of Genetics and Cancer, The University of Edinburgh, Crewe Road South, Edinburgh EH4 2XU, UK
- Simons Initiative for the Developing Brain, The University of Edinburgh, Hugh Robson Building, George Square, Edinburgh EH8 9XF, UK
| | - D R FitzPatrick
- MRC Human Genetics Unit, Western General Hospital, Institute of Genetics and Cancer, The University of Edinburgh, Crewe Road South, Edinburgh EH4 2XU, UK
- Transforming Genetic Medicine Initiative, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Simons Initiative for the Developing Brain, The University of Edinburgh, Hugh Robson Building, George Square, Edinburgh EH8 9XF, UK
| | - T I Simpson
- Institute for Adaptive and Neural Computation, Informatics Forum, The University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK
- Simons Initiative for the Developing Brain, The University of Edinburgh, Hugh Robson Building, George Square, Edinburgh EH8 9XF, UK
| |
Collapse
|
5
|
Birgmeier J, Haeussler M, Deisseroth CA, Steinberg EH, Jagadeesh KA, Ratner AJ, Guturu H, Wenger AM, Diekhans ME, Stenson PD, Cooper DN, Ré C, Beggs AH, Bernstein JA, Bejerano G. AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature. Sci Transl Med 2021; 12:12/544/eaau9113. [PMID: 32434849 DOI: 10.1126/scitranslmed.aau9113] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 08/14/2019] [Accepted: 04/22/2020] [Indexed: 12/21/2022]
Abstract
The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient's disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient's given set of phenotypes. Diagnosis of singleton patients (without relatives' exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database-based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children's Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu.
Collapse
Affiliation(s)
- Johannes Birgmeier
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Maximilian Haeussler
- Santa Cruz Genomics Institute, MS CBSE, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cole A Deisseroth
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ethan H Steinberg
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Karthik A Jagadeesh
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Alexander J Ratner
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Harendra Guturu
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Aaron M Wenger
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Mark E Diekhans
- Santa Cruz Genomics Institute, MS CBSE, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK
| | - Christopher Ré
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Alan H Beggs
- Manton Center for Orphan Disease Research, Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | | | - Gill Bejerano
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA. .,Department of Pediatrics, Stanford School of Medicine, Stanford, CA 94305, USA.,Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA.,Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
6
|
Zhao S, Su C, Lu Z, Wang F. Recent advances in biomedical literature mining. Brief Bioinform 2021; 22:bbaa057. [PMID: 32422651 PMCID: PMC8138828 DOI: 10.1093/bib/bbaa057] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 03/22/2020] [Accepted: 03/25/2020] [Indexed: 01/26/2023] Open
Abstract
The recent years have witnessed a rapid increase in the number of scientific articles in biomedical domain. These literature are mostly available and readily accessible in electronic format. The domain knowledge hidden in them is critical for biomedical research and applications, which makes biomedical literature mining (BLM) techniques highly demanding. Numerous efforts have been made on this topic from both biomedical informatics (BMI) and computer science (CS) communities. The BMI community focuses more on the concrete application problems and thus prefer more interpretable and descriptive methods, while the CS community chases more on superior performance and generalization ability, thus more sophisticated and universal models are developed. The goal of this paper is to provide a review of the recent advances in BLM from both communities and inspire new research directions.
Collapse
Affiliation(s)
- Sendong Zhao
- Department of Healthcare Policy and Research, Weill Medical College of Cornell University, New York, NY 10065, USA
| | - Chang Su
- Division of Health Informatics, Department of Healthcare Policy and Research at Weill Cornell Medicine at Cornell University, New York, NY, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI) at National Library of Medicine, National Institute of Health, Bethesda, MD, USA
| | - Fei Wang
- Department of Healthcare Policy and Research, Weill Medical College of Cornell University, New York, NY 10065, USA
| |
Collapse
|
7
|
DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier. PLoS Comput Biol 2020; 16:e1008453. [PMID: 33206638 PMCID: PMC7710064 DOI: 10.1371/journal.pcbi.1008453] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 12/02/2020] [Accepted: 10/20/2020] [Indexed: 12/21/2022] Open
Abstract
Predicting the phenotypes resulting from molecular perturbations is one of the key challenges in genetics. Both forward and reverse genetic screen are employed to identify the molecular mechanisms underlying phenotypes and disease, and these resulted in a large number of genotype–phenotype association being available for humans and model organisms. Combined with recent advances in machine learning, it may now be possible to predict human phenotypes resulting from particular molecular aberrations. We developed DeepPheno, a neural network based hierarchical multi-class multi-label classification method for predicting the phenotypes resulting from loss-of-function in single genes. DeepPheno uses the functional annotations with gene products to predict the phenotypes resulting from a loss-of-function; additionally, we employ a two-step procedure in which we predict these functions first and then predict phenotypes. Prediction of phenotypes is ontology-based and we propose a novel ontology-based classifier suitable for very large hierarchical classification tasks. These methods allow us to predict phenotypes associated with any known protein-coding gene. We evaluate our approach using evaluation metrics established by the CAFA challenge and compare with top performing CAFA2 methods as well as several state of the art phenotype prediction approaches, demonstrating the improvement of DeepPheno over established methods. Furthermore, we show that predictions generated by DeepPheno are applicable to predicting gene–disease associations based on comparing phenotypes, and that a large number of new predictions made by DeepPheno have recently been added as phenotype databases. Gene–phenotype associations can help to understand the underlying mechanisms of many genetic diseases. However, experimental identification, often involving animal models, is time consuming and expensive. Computational methods that predict gene–phenotype associations can be used instead. We developed DeepPheno, a novel approach for predicting the phenotypes resulting from a loss of function of a single gene. We use gene functions and gene expression as information to prediction phenotypes. Our method uses a neural network classifier that is able to account for hierarchical dependencies between phenotypes. We extensively evaluate our method and compare it with related approaches, and we show that DeepPheno results in better performance in several evaluations. Furthermore, we found that many of the new predictions made by our method have been added to phenotype association databases released one year later. Overall, DeepPheno simulates some aspects of human physiology and how molecular and physiological alterations lead to abnormal phenotypes.
Collapse
|
8
|
Ju M, Short AD, Thompson P, Bakerly ND, Gkoutos GV, Tsaprouni L, Ananiadou S. Annotating and detecting phenotypic information for chronic obstructive pulmonary disease. JAMIA Open 2020; 2:261-271. [PMID: 31984360 PMCID: PMC6951876 DOI: 10.1093/jamiaopen/ooz009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 02/21/2019] [Accepted: 03/19/2019] [Indexed: 12/29/2022] Open
Abstract
Objectives Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network-based named entity recognizer to detect fine-grained COPD phenotypic information. Materials and methods Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory conditional random field (BiLSTM-CRF) network firstly recognizes nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognize enclosing phenotype mentions. Results Our corpus of 30 full papers (available at: http://www.nactem.ac.uk/COPD) is annotated by experts with 27 030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognizing detailed phenotypic information. Discussion Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, for example, those specifically concerning reactions to treatments. Conclusion The importance of our corpus for developing methods to extract fine-grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases.
Collapse
Affiliation(s)
- Meizhi Ju
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
| | - Andrea D Short
- Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK
| | - Paul Thompson
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
| | - Nawar Diar Bakerly
- Salford Royal NHS Foundation Trust; and School of Health Sciences, The University of Manchester, Manchester, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, UK.,Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.,MRC Health Data Research UK (HDR UK).,NIHR Experimental Cancer Medicine Centre, Birmingham, UK.,NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, UK.,NIHR Biomedical Research Centre, Birmingham, UK
| | - Loukia Tsaprouni
- School of Health Sciences, Centre for Life and Sport Sciences, Birmingham City University, Birmingham, UK
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
| |
Collapse
|
9
|
Braun IR, Lawrence-Dill CJ. Automated Methods Enable Direct Computation on Phenotypic Descriptions for Novel Candidate Gene Prediction. FRONTIERS IN PLANT SCIENCE 2020; 10:1629. [PMID: 31998331 PMCID: PMC6965352 DOI: 10.3389/fpls.2019.01629] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 11/19/2019] [Indexed: 06/01/2023]
Abstract
Natural language descriptions of plant phenotypes are a rich source of information for genetics and genomics research. We computationally translated descriptions of plant phenotypes into structured representations that can be analyzed to identify biologically meaningful associations. These representations include the entity-quality (EQ) formalism, which uses terms from biological ontologies to represent phenotypes in a standardized, semantically rich format, as well as numerical vector representations generated using natural language processing (NLP) methods (such as the bag-of-words approach and document embedding). We compared resulting phenotype similarity measures to those derived from manually curated data to determine the performance of each method. Computationally derived EQ and vector representations were comparably successful in recapitulating biological truth to representations created through manual EQ statement curation. Moreover, NLP methods for generating vector representations of phenotypes are scalable to large quantities of text because they require no human input. These results indicate that it is now possible to computationally and automatically produce and populate large-scale information resources that enable researchers to query phenotypic descriptions directly.
Collapse
Affiliation(s)
- Ian R. Braun
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Interdepartmental Bioinformatics and Computational Biology, Iowa State University, Ames, IA, United States
| | - Carolyn J. Lawrence-Dill
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, United States
- Interdepartmental Bioinformatics and Computational Biology, Iowa State University, Ames, IA, United States
- Department of Agronomy, Iowa State University, Ames, IA, United States
| |
Collapse
|
10
|
Vervier K, Michaelson JJ. TiSAn: estimating tissue-specific effects of coding and non-coding variants. Bioinformatics 2019; 34:3061-3068. [PMID: 29912365 PMCID: PMC6137979 DOI: 10.1093/bioinformatics/bty301] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 04/16/2018] [Indexed: 02/06/2023] Open
Abstract
Motivation Model-based estimates of general deleteriousness, like CADD, DANN or PolyPhen, have become indispensable tools in the interpretation of genetic variants. However, these approaches say little about the tissues in which the effects of deleterious variants will be most meaningful. Tissue-specific annotations have been recently inferred for dozens of tissues/cell types from large collections of cross-tissue epigenomic data, and have demonstrated sensitivity in predicting affected tissues in complex traits. It remains unclear, however, whether including additional genome-scale data specific to the tissue of interest would appreciably improve functional annotations. Results Herein, we introduce TiSAn, a tool that integrates multiple genome-scale data sources, defined by expert knowledge. TiSAn uses machine learning to discriminate variants relevant to a tissue from those with no bearing on the function of that tissue. Predictions are made genome-wide, and can be used to contextualize and filter variants of interest in whole genome sequencing or genome-wide association studies. We demonstrate the accuracy and flexibility of TiSAn by producing predictive models for human heart and brain, and detecting tissue-relevant variations in large cohorts for autism spectrum disorder (TiSAn-brain) and coronary artery disease (TiSAn-heart). We find the multiomics TiSAn model is better able to prioritize genetic variants according to their tissue-specific action than the current state-of-the-art method, GenoSkyLine. Availability and implementation Software and vignettes are available at http://github.com/kevinVervier/TiSAn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kévin Vervier
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | - Jacob J Michaelson
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
11
|
Tsueng G, Nanis M, Fouquier JT, Mayers M, Good BM, Su AI. Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts. Bioinformatics 2019; 36:1226-1233. [PMID: 31504205 PMCID: PMC8104067 DOI: 10.1093/bioinformatics/btz678] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 08/05/2019] [Accepted: 08/29/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Biomedical literature is growing at a rate that outpaces our ability to harness the knowledge contained therein. To mine valuable inferences from the large volume of literature, many researchers use information extraction algorithms to harvest information in biomedical texts. Information extraction is usually accomplished via a combination of manual expert curation and computational methods. Advances in computational methods usually depend on the time-consuming generation of gold standards by a limited number of expert curators. Citizen science is public participation in scientific research. We previously found that citizen scientists are willing and capable of performing named entity recognition of disease mentions in biomedical abstracts, but did not know if this was true with relationship extraction (RE). RESULTS In this article, we introduce the Relationship Extraction Module of the web-based application Mark2Cure (M2C) and demonstrate that citizen scientists can perform RE. We confirm the importance of accurate named entity recognition on user performance of RE and identify design issues that impacted data quality. We find that the data generated by citizen scientists can be used to identify relationship types not currently available in the M2C Relationship Extraction Module. We compare the citizen science-generated data with algorithm-mined data and identify ways in which the two approaches may complement one another. We also discuss opportunities for future improvement of this system, as well as the potential synergies between citizen science, manual biocuration and natural language processing. AVAILABILITY AND IMPLEMENTATION Mark2Cure platform: https://mark2cure.org; Mark2Cure source code: https://github.com/sulab/mark2cure; and data and analysis code for this article: https://github.com/gtsueng/M2C_rel_nb. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Max Nanis
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Jennifer T Fouquier
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Michael Mayers
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Benjamin M Good
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
12
|
Manosroi W, Williams GH. Genetics of Human Primary Hypertension: Focus on Hormonal Mechanisms. Endocr Rev 2019; 40:825-856. [PMID: 30590482 PMCID: PMC6936319 DOI: 10.1210/er.2018-00071] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 09/07/2018] [Indexed: 02/06/2023]
Abstract
Increasingly, primary hypertension is being considered a syndrome and not a disease, with the individual causes (diseases) having a common sign-an elevated blood pressure. To determine these causes, genetic tools are increasingly employed. This review identified 62 proposed genes. However, only 21 of them met our inclusion criteria: (i) primary hypertension, (ii) two or more supporting cohorts from different publications or within a single publication or one supporting cohort with a confirmatory genetically modified animal study, and (iii) 600 or more subjects in the primary cohort; when including our exclusion criteria: (i) meta-analyses or reviews, (ii) secondary and monogenic hypertension, (iii) only hypertensive complications, (iv) genes related to blood pressure but not hypertension per se, (v) nonsupporting studies more common than supporting ones, and (vi) studies that did not perform a Bonferroni or similar multiassessment correction. These 21 genes were organized in a four-tiered structure: distant phenotype (hypertension); intermediate phenotype [salt-sensitive (18) or salt-resistant (0)]; subintermediate phenotypes under salt-sensitive hypertension [normal renin (4), low renin (8), and unclassified renin (6)]; and proximate phenotypes (specific genetically driven hypertensive subgroup). Many proximate hypertensive phenotypes had a substantial endocrine component. In conclusion, primary hypertension is a syndrome; many proposed genes are likely to be false positives; and deep phenotyping will be required to determine the utility of genetics in the treatment of hypertension. However, to date, the positive genes are associated with nearly 50% of primary hypertensives, suggesting that in the near term precise, mechanistically driven treatment and prevention strategies for the specific primary hypertension subgroups are feasible.
Collapse
Affiliation(s)
- Worapaka Manosroi
- Division of Endocrinology, Diabetes, and Hypertension, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts.,Division of Endocrinology and Metabolism, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
| | - Gordon H Williams
- Division of Endocrinology, Diabetes, and Hypertension, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
13
|
Abstract
Inherited metabolic disorders (IMDs) are debilitating inherited diseases, with phenotypic, biochemical and genetic heterogeneity, frequently leading to prolonged diagnostic odysseys. Mitochondrial disorders represent one of the most severe classes of IMDs, wherein defects in >350 genes lead to multi-system disease. Diagnostic rates have improved considerably following the adoption of next-generation sequencing (NGS) technologies, but are still far from perfect. Phenomic annotation is an emerging concept which is being utilised to enhance interpretation of NGS results. To test whether phenomic correlations have utility in mitochondrial disease and IMDs, we created a gene-to-phenotype interaction network with searchable elements, for Leigh syndrome, a frequently observed paediatric mitochondrial disorder. The Leigh Map comprises data on 92 genes and 275 phenotypes standardised in human phenotype ontology terms, with 80% predictive accuracy. This commentary highlights the usefulness of the Leigh Map and similar resources and the challenges associated with integrating phenomic technologies into clinical practice.
Collapse
Affiliation(s)
- Joyeeta Rahman
- UCL Great Ormond Street Institute of Child Health, London, UK
| | - Shamima Rahman
- UCL Great Ormond Street Institute of Child Health, London, UK
| |
Collapse
|
14
|
Xing W, Qi J, Yuan X, Li L, Zhang X, Fu Y, Xiong S, Hu L, Peng J. A gene-phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach. Bioinformatics 2018; 34:i386-i394. [PMID: 29950017 PMCID: PMC6022650 DOI: 10.1093/bioinformatics/bty263] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Motivation The fundamental challenge of modern genetic analysis is to establish gene-phenotype correlations that are often found in the large-scale publications. Because lexical features of gene are relatively regular in text, the main challenge of these relation extraction is phenotype recognition. Due to phenotypic descriptions are often study- or author-specific, few lexicon can be used to effectively identify the entire phenotypic expressions in text, especially for plants. Results We have proposed a pipeline for extracting phenotype, gene and their relations from biomedical literature. Combined with abbreviation revision and sentence template extraction, we improved the unsupervised word-embedding-to-sentence-embedding cascaded approach as representation learning to recognize the various broad phenotypic information in literature. In addition, the dictionary- and rule-based method was applied for gene recognition. Finally, we integrated one of famous information extraction system OLLIE to identify gene-phenotype relations. To demonstrate the applicability of the pipeline, we established two types of comparison experiment using model organism Arabidopsis thaliana. In the comparison of state-of-the-art baselines, our approach obtained the best performance (F1-Measure of 66.83%). We also applied the pipeline to 481 full-articles from TAIR gene-phenotype manual relationship dataset to prove the validity. The results showed that our proposed pipeline can cover 70.94% of the original dataset and add 373 new relations to expand it. Availability and implementation The source code is available at http://www.wutbiolab.cn: 82/Gene-Phenotype-Relation-Extraction-Pipeline.zip. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenhui Xing
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Junsheng Qi
- Department of Plant Science, College of Biological Science, China Agricultural University, Beijing, China
| | - Xiaohui Yuan
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Lin Li
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Xiaoyu Zhang
- Britton Chance Center for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics-Huazhong University of Science and Technology, Wuhan, China
| | - Yuhua Fu
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Shengwu Xiong
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Lun Hu
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| | - Jing Peng
- School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
| |
Collapse
|
15
|
Henderson J, Ke J, Ho JC, Ghosh J, Wallace BC. Phenotype Instance Verification and Evaluation Tool (PIVET): A Scaled Phenotype Evidence Generation Framework Using Web-Based Medical Literature. J Med Internet Res 2018; 20:e164. [PMID: 29728351 PMCID: PMC5960038 DOI: 10.2196/jmir.9610] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Revised: 02/26/2018] [Accepted: 02/28/2018] [Indexed: 12/24/2022] Open
Abstract
Background Researchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making. Objective The objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publically available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved. Methods PIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET’s phenotype representation with PheKnow-Cloud’s by using PheKnow-Cloud’s experimental setup. In PIVET’s framework, we also introduce a statistical model trained on domain expert–verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner. Results PIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET’s analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes. Conclusions Our study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy.
Collapse
Affiliation(s)
- Jette Henderson
- The University of Texas at Austin, Austin, TX, United States
| | - Junyuan Ke
- Emory University, Atlanda, GA, United States
| | - Joyce C Ho
- Emory University, Atlanda, GA, United States
| | - Joydeep Ghosh
- The University of Texas at Austin, Austin, TX, United States
| | | |
Collapse
|
16
|
Felgueiras J, Silva JV, Fardilha M. Adding biological meaning to human protein-protein interactions identified by yeast two-hybrid screenings: A guide through bioinformatics tools. J Proteomics 2018; 171:127-140. [PMID: 28526529 DOI: 10.1016/j.jprot.2017.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 04/26/2017] [Accepted: 05/13/2017] [Indexed: 02/02/2023]
|
17
|
Taboada M, Rodriguez H, Gudivada RC, Martinez D. A new synonym-substitution method to enrich the human phenotype ontology. BMC Bioinformatics 2017; 18:446. [PMID: 29017443 PMCID: PMC5635572 DOI: 10.1186/s12859-017-1858-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 10/02/2017] [Indexed: 12/29/2022] Open
Abstract
Background Named entity recognition is critical for biomedical text mining, where it is not unusual to find entities labeled by a wide range of different terms. Nowadays, ontologies are one of the crucial enabling technologies in bioinformatics, providing resources for improved natural language processing tasks. However, biomedical ontology-based named entity recognition continues to be a major research problem. Results This paper presents an automated synonym-substitution method to enrich the Human Phenotype Ontology (HPO) with new synonyms. The approach is mainly based on both the lexical properties of the terms and the hierarchical structure of the ontology. By scanning the lexical difference between a term and its descendant terms, the method can learn new names and modifiers in order to generate synonyms for the descendant terms. By searching for the exact phrases in MEDLINE, the method can automatically rule out illogical candidate synonyms. In total, 745 new terms were identified. These terms were indirectly evaluated through the concept annotations on a gold standard corpus and also by document retrieval on a collection of abstracts on hereditary diseases. A moderate improvement in the F-measure performance on the gold standard corpus was observed. Additionally, 6% more abstracts on hereditary diseases were retrieved, and this percentage was 33% higher if only the highly informative concepts were considered. Conclusions A synonym-substitution procedure that leverages the HPO hierarchical structure works well for a reliable and automatic extension of the terminology. The results show that the generated synonyms have a positive impact on concept recognition, mainly those synonyms corresponding to highly informative HPO terms. Electronic supplementary material The online version of this article (10.1186/s12859-017-1858-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maria Taboada
- Department of Electronics & Computer Science, University of Santiago de Compostela, Campus Vida, Santiago de Compostela, 15705, Spain.
| | - Hadriana Rodriguez
- Department of Electronics & Computer Science, University of Santiago de Compostela, Campus Vida, Santiago de Compostela, 15705, Spain
| | | | - Diego Martinez
- Department of Applied Physics, University of Santiago de Compostela, 15705, Santiago de Compostela, Campus Vida, Spain
| |
Collapse
|
18
|
Wang X, Yu S, Jia Q, Chen L, Zhong J, Pan Y, Shen P, Shen Y, Wang S, Wei Z, Cao Y, Lu Y. NiaoDuQing granules relieve chronic kidney disease symptoms by decreasing renal fibrosis and anemia. Oncotarget 2017; 8:55920-55937. [PMID: 28915563 PMCID: PMC5593534 DOI: 10.18632/oncotarget.18473] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Accepted: 05/23/2017] [Indexed: 11/25/2022] Open
Abstract
NiaoDuQing (NDQ) granules, a traditional Chinese medicine, has been clinically used in China for over fourteen years to treat chronic kidney disease (CKD). To elucidate the mechanisms underlying the therapeutic benefits of NDQ, we designed an approach incorporating chemoinformatics, bioinformatics, network biology methods, and cellular and molecular biology experiments. A total of 182 active compounds were identified in NDQ granules, and 397 putative targets associated with different diseases were derived through ADME modelling and target prediction tools. Protein-protein interaction networks of CKD-related and putative NDQ targets were constructed, and 219 candidate targets were identified based on topological features. Pathway enrichment analysis showed that the candidate targets were mostly related to the TGF-β, the p38MAPK, and the erythropoietin (EPO) receptor signaling pathways, which are known contributors to renal fibrosis and/or renal anemia. A rat model of CKD was established to validate the drug-target mechanisms predicted by the systems pharmacology analysis. Experimental results confirmed that NDQ granules exerted therapeutic effects on CKD and its comorbidities, including renal anemia, mainly by modulating the TGF-β and EPO signaling pathways. Thus, the pharmacological actions of NDQ on CKD symptoms correlated well with in silico predictions.
Collapse
Affiliation(s)
- Xu Wang
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| | - Suyun Yu
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| | - Qi Jia
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| | - Lichuan Chen
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| | - Jinqiu Zhong
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| | - Yanhong Pan
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| | - Peiliang Shen
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| | - Yin Shen
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| | - Siliang Wang
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| | - Zhonghong Wei
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| | - Yuzhu Cao
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| | - Yin Lu
- Jiangsu Key Laboratory for Pharmacology and Safety Evaluation of Chinese Materia Medica, School of Pharmacy, Nanjing University of Chinese Medicine, Nanjing, P. R. China.,Jiangsu Collaborative Innovation Center of Traditional Chinese Medicine Prevention and Treatment of Tumor, Nanjing University of Chinese Medicine, Nanjing, P. R. China
| |
Collapse
|
19
|
Alnazzawi N, Thompson P, Ananiadou S. Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource. PLoS One 2016; 11:e0162287. [PMID: 27643689 PMCID: PMC5028053 DOI: 10.1371/journal.pone.0162287] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2016] [Accepted: 08/19/2016] [Indexed: 02/02/2023] Open
Abstract
Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus—a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm’s wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks.
Collapse
Affiliation(s)
- Noha Alnazzawi
- National Centre for Text Mining, Manchester Institute of Biotechnology, Manchester University, Manchester, United Kingdom
- * E-mail:
| | - Paul Thompson
- National Centre for Text Mining, Manchester Institute of Biotechnology, Manchester University, Manchester, United Kingdom
| | - Sophia Ananiadou
- National Centre for Text Mining, Manchester Institute of Biotechnology, Manchester University, Manchester, United Kingdom
| |
Collapse
|
20
|
Lelieveld SH, Veltman JA, Gilissen C. Novel bioinformatic developments for exome sequencing. Hum Genet 2016; 135:603-14. [PMID: 27075447 PMCID: PMC4883269 DOI: 10.1007/s00439-016-1658-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 03/15/2016] [Indexed: 01/19/2023]
Abstract
With the widespread adoption of next generation sequencing technologies by the genetics community and the rapid decrease in costs per base, exome sequencing has become a standard within the repertoire of genetic experiments for both research and diagnostics. Although bioinformatics now offers standard solutions for the analysis of exome sequencing data, many challenges still remain; especially the increasing scale at which exome data are now being generated has given rise to novel challenges in how to efficiently store, analyze and interpret exome data of this magnitude. In this review we discuss some of the recent developments in bioinformatics for exome sequencing and the directions that this is taking us to. With these developments, exome sequencing is paving the way for the next big challenge, the application of whole genome sequencing.
Collapse
Affiliation(s)
- Stefan H Lelieveld
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands
| | - Joris A Veltman
- Department of Human Genetics, Donders Centre for Neuroscience, Radboudumc, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands
- Department of Clinical Genetics, GROW-School for Oncology and Developmental Biology, Maastricht University Medical Centre, Universiteitssingel 50, 6229 ER, Maastricht, The Netherlands
| | - Christian Gilissen
- Department of Human Genetics, Donders Centre for Neuroscience, Radboudumc, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands.
| |
Collapse
|
21
|
Lelieveld SH, Veltman JA, Gilissen C. Novel bioinformatic developments for exome sequencing. Hum Genet 2016. [PMID: 27075447 DOI: 10.1007/s00439‐016‐1658‐6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
With the widespread adoption of next generation sequencing technologies by the genetics community and the rapid decrease in costs per base, exome sequencing has become a standard within the repertoire of genetic experiments for both research and diagnostics. Although bioinformatics now offers standard solutions for the analysis of exome sequencing data, many challenges still remain; especially the increasing scale at which exome data are now being generated has given rise to novel challenges in how to efficiently store, analyze and interpret exome data of this magnitude. In this review we discuss some of the recent developments in bioinformatics for exome sequencing and the directions that this is taking us to. With these developments, exome sequencing is paving the way for the next big challenge, the application of whole genome sequencing.
Collapse
Affiliation(s)
- Stefan H Lelieveld
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands
| | - Joris A Veltman
- Department of Human Genetics, Donders Centre for Neuroscience, Radboudumc, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands.,Department of Clinical Genetics, GROW-School for Oncology and Developmental Biology, Maastricht University Medical Centre, Universiteitssingel 50, 6229 ER, Maastricht, The Netherlands
| | - Christian Gilissen
- Department of Human Genetics, Donders Centre for Neuroscience, Radboudumc, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands.
| |
Collapse
|