1
|
Li YH, Li XX, Hong JJ, Wang YX, Fu JB, Yang H, Yu CY, Li FC, Hu J, Xue WW, Jiang YY, Chen YZ, Zhu F. Clinical trials, progression-speed differentiating features and swiftness rule of the innovative targets of first-in-class drugs. Brief Bioinform 2021; 21:649-662. [PMID: 30689717 PMCID: PMC7299286 DOI: 10.1093/bib/bby130] [Citation(s) in RCA: 122] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 11/01/2018] [Accepted: 11/02/2018] [Indexed: 12/14/2022] Open
Abstract
Drugs produce their therapeutic effects by modulating specific targets, and there are 89 innovative targets of first-in-class drugs approved in 2004–17, each with information about drug clinical trial dated back to 1984. Analysis of the clinical trial timelines of these targets may reveal the trial-speed differentiating features for facilitating target assessment. Here we present a comprehensive analysis of all these 89 targets, following the earlier studies for prospective prediction of clinical success of the targets of clinical trial drugs. Our analysis confirmed the literature-reported common druggability characteristics for clinical success of these innovative targets, exposed trial-speed differentiating features associated to the on-target and off-target collateral effects in humans and further revealed a simple rule for identifying the speedy human targets through clinical trials (from the earliest phase I to the 1st drug approval within 8 years). This simple rule correctly identified 75.0% of the 28 speedy human targets and only unexpectedly misclassified 13.2% of 53 non-speedy human targets. Certain extraordinary circumstances were also discovered to likely contribute to the misclassification of some human targets by this simple rule. Investigation and knowledge of trial-speed differentiating features enable prioritized drug discovery and development.
Collapse
Affiliation(s)
- Ying Hong Li
- Lab of Innovative Drug Research and Bioinformatics, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Xiao Xu Li
- Lab of Innovative Drug Research and Bioinformatics, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Jia Jun Hong
- Lab of Innovative Drug Research and Bioinformatics, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yun Xia Wang
- Lab of Innovative Drug Research and Bioinformatics, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jian Bo Fu
- Lab of Innovative Drug Research and Bioinformatics, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Hong Yang
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Chun Yan Yu
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Feng Cheng Li
- Lab of Innovative Drug Research and Bioinformatics, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jie Hu
- School of International Studies, Zhejiang University, Hangzhou, China
| | - Wei Wei Xue
- Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Yu Yang Jiang
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen, Guangdong, China
| | - Yu Zong Chen
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Singapore, Singapore
| | - Feng Zhu
- Lab of Innovative Drug Research and Bioinformatics, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,Innovative Drug Research and Bioinformatics Group, School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| |
Collapse
|
2
|
Yin J, Li X, Li F, Lu Y, Zeng S, Zhu F. Identification of the key target profiles underlying the drugs of narrow therapeutic index for treating cancer and cardiovascular disease. Comput Struct Biotechnol J 2021; 19:2318-2328. [PMID: 33995923 PMCID: PMC8105181 DOI: 10.1016/j.csbj.2021.04.035] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 04/09/2021] [Accepted: 04/15/2021] [Indexed: 12/14/2022] Open
Abstract
An appropriate therapeutic index is crucial for drug discovery and development since narrow therapeutic index (NTI) drugs with slight dosage variation may induce severe adverse drug reactions or potential treatment failure. To date, the shared characteristics underlying the targets of NTI drugs have been explored by several studies, which have been applied to identify potential drug targets. However, the association between the drug therapeutic index and the related disease has not been dissected, which is important for revealing the NTI drug mechanism and optimizing drug design. Therefore, in this study, two classes of disease (cancers and cardiovascular disorders) with the largest number of NTI drugs were selected, and the target property of the corresponding NTI drugs was analyzed. By calculating the biological system profiles and human protein–protein interaction (PPI) network properties of drug targets and adopting an AI-based algorithm, differentiated features between two diseases were discovered to reveal the distinct underlying mechanisms of NTI drugs in different diseases. Consequently, ten shared features and four unique features were identified for both diseases to distinguish NTI from NNTI drug targets. These computational discoveries, as well as the newly found features, suggest that in the clinical study of avoiding narrow therapeutic index in those diseases, the ability of target to be a hub and the efficiency of target signaling in the human PPI network should be considered, and it could thus provide novel guidance in the drug discovery and clinical research process and help to estimate the drug safety of cancer and cardiovascular disease.
Collapse
Affiliation(s)
- Jiayi Yin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiaoxu Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yinjing Lu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Su Zeng
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Hangzhou 310018, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Hangzhou 310018, China.,Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
3
|
Sulakhe D, D'Souza M, Wang S, Balasubramanian S, Athri P, Xie B, Canzar S, Agam G, Gilliam TC, Maltsev N. Exploring the functional impact of alternative splicing on human protein isoforms using available annotation sources. Brief Bioinform 2020; 20:1754-1768. [PMID: 29931155 DOI: 10.1093/bib/bby047] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/02/2018] [Indexed: 12/30/2022] Open
Abstract
In recent years, the emphasis of scientific inquiry has shifted from whole-genome analyses to an understanding of cellular responses specific to tissue, developmental stage or environmental conditions. One of the central mechanisms underlying the diversity and adaptability of the contextual responses is alternative splicing (AS). It enables a single gene to encode multiple isoforms with distinct biological functions. However, to date, the functions of the vast majority of differentially spliced protein isoforms are not known. Integration of genomic, proteomic, functional, phenotypic and contextual information is essential for supporting isoform-based modeling and analysis. Such integrative proteogenomics approaches promise to provide insights into the functions of the alternatively spliced protein isoforms and provide high-confidence hypotheses to be validated experimentally. This manuscript provides a survey of the public databases supporting isoform-based biology. It also presents an overview of the potential global impact of AS on the human canonical gene functions, molecular interactions and cellular pathways.
Collapse
Affiliation(s)
- Dinanath Sulakhe
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Mark D'Souza
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA
| | - Sheng Wang
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA
| | - Sandhya Balasubramanian
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Genentech, Inc. 1 DNA Way, Mail Stop: 35-6J, South San Francisco, CA, USA
| | - Prashanth Athri
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, Kasavanahalli, Carmelaram P.O., Bengaluru, Karnataka, India
| | - Bingqing Xie
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - Stefan Canzar
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL, USA.,Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gady Agam
- Department of Computer Science, Illinois Institute of Technology, Chicago, IL, USA
| | - T Conrad Gilliam
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| | - Natalia Maltsev
- Department of Human Genetics, University of Chicago, 920 E. 58th Street, Chicago, IL, USA.,Computation Institute, University of Chicago, 5735 S. Ellis Avenue, Chicago, IL, USA
| |
Collapse
|
4
|
Nigenda‐Morales SF, Hu Y, Beasley JC, Ruiz‐Piña HA, Valenzuela‐Galván D, Wayne RK. Transcriptomic analysis of skin pigmentation variation in the Virginia opossum (
Didelphis virginiana
). Mol Ecol 2018; 27:2680-2697. [DOI: 10.1111/mec.14712] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Revised: 04/05/2018] [Accepted: 04/17/2018] [Indexed: 12/19/2022]
Affiliation(s)
- Sergio F. Nigenda‐Morales
- Department of Ecology and Evolutionary Biology University of California, Los Angeles Los Angeles California
| | - Yibo Hu
- Key Lab of Animal Ecology and Conservation Biology Institute of Zoology Chinese Academy of Sciences Chaoyang, Beijing China
| | - James C. Beasley
- Savannah River Ecology Lab Warnell School of Forestry and Natural Resources University of Georgia Aiken South Carolina
| | - Hugo A. Ruiz‐Piña
- Centro de Investigaciones Regionales “Dr. Hideyo Noguchi” Universidad Autónoma de Yucatán Mérida Yucatán Mexico
| | - David Valenzuela‐Galván
- Departamento de Ecología Evolutiva Centro de Investigación en Biodiversidad y Conservación Universidad Autónoma del Estado de Morelos Cuernavaca Morelos Mexico
| | - Robert K. Wayne
- Department of Ecology and Evolutionary Biology University of California, Los Angeles Los Angeles California
| |
Collapse
|
5
|
Nakamura Y, Kudo T, Terashima S, Saito M, Nambara E, Yano K. CATchUP: A Web Database for Spatiotemporally Regulated Genes. PLANT & CELL PHYSIOLOGY 2017; 58:e3. [PMID: 28013273 DOI: 10.1093/pcp/pcw199] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2016] [Accepted: 11/06/2016] [Indexed: 06/06/2023]
Abstract
For proper control of biological activity, some key genes are highly expressed in a particular spatiotemporal domain. Mining of such spatiotemporally expressed genes using large-scale gene expression data derived from a broad range of experimental sources facilitates our understanding of genome-scale functional gene networks. However, comprehensive information on spatiotemporally expressed genes is lacking in plants. To collect such information, we devised a new index, Δdmax, which is the maximum difference in relative gene expression levels between sample runs which are neighboring when sorted by the levels. Employing this index, we comprehensively evaluated transcripts using large-scale RNA sequencing (RNA-Seq) data stored in the Sequence Read Archive for eight plant species: Arabidopsis thaliana (Arabidopsis), Solanum lycopersicum (tomato), Solanum tuberosum (potato), Oryza sativa (rice), Sorghum bicolor (sorghum), Vitis vinifera (grape), Medicago truncatula (Medicago), and Glycine max (soybean). Based on the frequency distribution of the Δdmax values, approximately 70,000 transcripts showing 0.3 or larger Δdmax values were extracted for the eight species. Information on these genes including the Δdmax values, functional annotations, conservation among species, and experimental conditions where the genes show high expression levels is provided in a new database, CATchUP (http://plantomics.mind.meiji.ac.jp/CATchUP). The CATchUP database assists in identifying genes specifically expressed under particular conditions with powerful search functions and an intuitive graphical user interface.
Collapse
Affiliation(s)
- Yukino Nakamura
- Bioinformatics Laboratory, School of Agriculture, Meiji University, Higashi-mita, Tama-ku, Kawasaki, Kanagawa, Japan
| | - Toru Kudo
- Bioinformatics Laboratory, School of Agriculture, Meiji University, Higashi-mita, Tama-ku, Kawasaki, Kanagawa, Japan
| | - Shin Terashima
- Bioinformatics Laboratory, School of Agriculture, Meiji University, Higashi-mita, Tama-ku, Kawasaki, Kanagawa, Japan
| | - Misa Saito
- Bioinformatics Laboratory, School of Agriculture, Meiji University, Higashi-mita, Tama-ku, Kawasaki, Kanagawa, Japan
| | - Eiji Nambara
- Department of Cell & Systems Biology, University of Toronto, Willcocks Street, Toronto, Ontario, Canada
| | - Kentaro Yano
- Bioinformatics Laboratory, School of Agriculture, Meiji University, Higashi-mita, Tama-ku, Kawasaki, Kanagawa, Japan
| |
Collapse
|
6
|
Comparison of FDA Approved Kinase Targets to Clinical Trial Ones: Insights from Their System Profiles and Drug-Target Interaction Networks. BIOMED RESEARCH INTERNATIONAL 2016; 2016:2509385. [PMID: 27547755 PMCID: PMC4980536 DOI: 10.1155/2016/2509385] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 06/14/2016] [Accepted: 06/28/2016] [Indexed: 12/21/2022]
Abstract
Kinase is one of the most productive classes of established targets, but the majority of approved drugs against kinase were developed only for cancer. Intensive efforts were therefore exerted for releasing its therapeutic potential by discovering new therapeutic area. Kinases in clinical trial could provide great opportunities for treating various diseases. However, no systematic comparison between system profiles of established targets and those of clinical trial ones was conducted. The reveal of probable difference or shift of trend would help to identify key factors defining druggability of established targets. In this study, a comparative analysis of system profiles of both types of targets was conducted. Consequently, the systems profiles of the majority of clinical trial kinases were identified to be very similar to those of established ones, but percentages of established targets obeying the system profiles appeared to be slightly but consistently higher than those of clinical trial targets. Moreover, a shift of trend in the system profiles from the clinical trial to the established targets was identified, and popular kinase targets were discovered. In sum, this comparative study may help to facilitate the identification of the druggability of established drug targets by their system profiles and drug-target interaction networks.
Collapse
|
7
|
Santos A, Tsafou K, Stolte C, Pletscher-Frankild S, O’Donoghue SI, Jensen LJ. Comprehensive comparison of large-scale tissue expression datasets. PeerJ 2015; 3:e1054. [PMID: 26157623 PMCID: PMC4493645 DOI: 10.7717/peerj.1054] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 06/04/2015] [Indexed: 01/01/2023] Open
Abstract
For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface.
Collapse
Affiliation(s)
- Alberto Santos
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kalliopi Tsafou
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Christian Stolte
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, Australia
| | - Sune Pletscher-Frankild
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Seán I. O’Donoghue
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, Australia
- Garvan Institute of Medical Research, Sydney, Australia
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
8
|
Anafi RC, Lee Y, Sato TK, Venkataraman A, Ramanathan C, Kavakli IH, Hughes ME, Baggs JE, Growe J, Liu AC, Kim J, Hogenesch JB. Machine learning helps identify CHRONO as a circadian clock component. PLoS Biol 2014; 12:e1001840. [PMID: 24737000 PMCID: PMC3988006 DOI: 10.1371/journal.pbio.1001840] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Accepted: 03/07/2014] [Indexed: 12/03/2022] Open
Abstract
Over the last decades, researchers have characterized a set of "clock genes" that drive daily rhythms in physiology and behavior. This arduous work has yielded results with far-reaching consequences in metabolic, psychiatric, and neoplastic disorders. Recent attempts to expand our understanding of circadian regulation have moved beyond the mutagenesis screens that identified the first clock components, employing higher throughput genomic and proteomic techniques. In order to further accelerate clock gene discovery, we utilized a computer-assisted approach to identify and prioritize candidate clock components. We used a simple form of probabilistic machine learning to integrate biologically relevant, genome-scale data and ranked genes on their similarity to known clock components. We then used a secondary experimental screen to characterize the top candidates. We found that several physically interact with known clock components in a mammalian two-hybrid screen and modulate in vitro cellular rhythms in an immortalized mouse fibroblast line (NIH 3T3). One candidate, Gene Model 129, interacts with BMAL1 and functionally represses the key driver of molecular rhythms, the BMAL1/CLOCK transcriptional complex. Given these results, we have renamed the gene CHRONO (computationally highlighted repressor of the network oscillator). Bi-molecular fluorescence complementation and co-immunoprecipitation demonstrate that CHRONO represses by abrogating the binding of BMAL1 to its transcriptional co-activator CBP. Most importantly, CHRONO knockout mice display a prolonged free-running circadian period similar to, or more drastic than, six other clock components. We conclude that CHRONO is a functional clock component providing a new layer of control on circadian molecular dynamics.
Collapse
Affiliation(s)
- Ron C. Anafi
- Division of Sleep Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
- Center for Sleep and Circadian Neurobiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Yool Lee
- Department of Pharmacology and the Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Trey K. Sato
- Department of Pharmacology and the Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Anand Venkataraman
- Department of Pharmacology and the Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Chidambaram Ramanathan
- Department of Biological Sciences, University of Memphis, Memphis, Tennessee, United States of America
| | - Ibrahim H. Kavakli
- Department of Chemical and Biological Engineering, Koc University, Istanbul, Turkey
| | - Michael E. Hughes
- Department of Biology, University of Missouri–St. Louis, St. Louis, Missouri, United States of America
| | - Julie E. Baggs
- Department of Pharmacology, Morehouse School of Medicine, Atlanta, Georgia, United States of America
| | - Jacqueline Growe
- Division of Sleep Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
- Center for Sleep and Circadian Neurobiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Andrew C. Liu
- Department of Biological Sciences, University of Memphis, Memphis, Tennessee, United States of America
| | - Junhyong Kim
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - John B. Hogenesch
- Center for Sleep and Circadian Neurobiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
- Department of Pharmacology and the Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
9
|
PaGenBase: a pattern gene database for the global and dynamic understanding of gene function. PLoS One 2013; 8:e80747. [PMID: 24312499 PMCID: PMC3846610 DOI: 10.1371/journal.pone.0080747] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 10/07/2013] [Indexed: 11/30/2022] Open
Abstract
Pattern genes are a group of genes that have a modularized expression behavior under serial physiological conditions. The identification of pattern genes will provide a path toward a global and dynamic understanding of gene functions and their roles in particular biological processes or events, such as development and pathogenesis. In this study, we present PaGenBase, a novel repository for the collection of tissue- and time-specific pattern genes, including specific genes, selective genes, housekeeping genes and repressed genes. The PaGenBase database is now freely accessible at http://bioinf.xmu.edu.cn/PaGenBase/. In the current version (PaGenBase 1.0), the database contains 906,599 pattern genes derived from the literature or from data mining of more than 1,145,277 gene expression profiles in 1,062 distinct samples collected from 11 model organisms. Four statistical parameters were used to quantitatively evaluate the pattern genes. Moreover, three methods (quick search, advanced search and browse) were designed for rapid and customized data retrieval. The potential applications of PaGenBase are also briefly described. In summary, PaGenBase will serve as a resource for the global and dynamic understanding of gene function and will facilitate high-level investigations in a variety of fields, including the study of development, pathogenesis and novel drug discovery.
Collapse
|
10
|
Shen-Orr SS, Gaujoux R. Computational deconvolution: extracting cell type-specific information from heterogeneous samples. Curr Opin Immunol 2013; 25:571-8. [PMID: 24148234 DOI: 10.1016/j.coi.2013.09.015] [Citation(s) in RCA: 203] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2013] [Revised: 09/22/2013] [Accepted: 09/30/2013] [Indexed: 12/31/2022]
Abstract
The quanta unit of the immune system is the cell, yet analyzed samples are often heterogeneous with respect to cell subsets which can mislead result interpretation. Experimentally, researchers face a difficult choice whether to profile heterogeneous samples with the ensuing confounding effects, or a priori focus on a few cell subsets of interest, potentially limiting new discoveries. An attractive alternative solution is to extract cell subset-specific information directly from heterogeneous samples via computational deconvolution techniques, thereby capturing both cell-centered and whole system level context. Such approaches are capable of unraveling novel biology, undetectable otherwise. Here we review the present state of available deconvolution techniques, their advantages and limitations, with a focus on blood expression data and immunological studies in general.
Collapse
Affiliation(s)
- Shai S Shen-Orr
- Rappaport Institute of Medical Research, Technion-Israel Institute of Technology, Haifa 31096, Israel; Department of Immunology, Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 31096, Israel; Faculty of Biology, Technion-Israel Institute of Technology, Haifa 31096, Israel.
| | | |
Collapse
|
11
|
Milnthorpe AT, Soloviev M. The use of EST expression matrixes for the quality control of gene expression data. PLoS One 2012; 7:e32966. [PMID: 22412959 PMCID: PMC3297614 DOI: 10.1371/journal.pone.0032966] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2011] [Accepted: 02/06/2012] [Indexed: 01/10/2023] Open
Abstract
EST expression profiling provides an attractive tool for studying differential gene expression, but cDNA libraries' origins and EST data quality are not always known or reported. Libraries may originate from pooled or mixed tissues; EST clustering, EST counts, library annotations and analysis algorithms may contain errors. Traditional data analysis methods, including research into tissue-specific gene expression, assume EST counts to be correct and libraries to be correctly annotated, which is not always the case. Therefore, a method capable of assessing the quality of expression data based on that data alone would be invaluable for assessing the quality of EST data and determining their suitability for mRNA expression analysis. Here we report an approach to the selection of a small generic subset of 244 UniGene clusters suitable for identification of the tissue of origin for EST libraries and quality control of the expression data using EST expression information alone. We created a small expression matrix of UniGene IDs using two rounds of selection followed by two rounds of optimisation. Our selection procedures differ from traditional approaches to finding "tissue-specific" genes and our matrix yields consistency high positive correlation values for libraries with confirmed tissues of origin and can be applied for tissue typing and quality control of libraries as small as just a few hundred total ESTs. Furthermore, we can pick up tissue correlations between related tissues e.g. brain and peripheral nervous tissue, heart and muscle tissues and identify tissue origins for a few libraries of uncharacterised tissue identity. It was possible to confirm tissue identity for some libraries which have been derived from cancer tissues or have been normalised. Tissue matching is affected strongly by cancer progression or library normalisation and our approach may potentially be applied for elucidating the stage of normalisation in normalised libraries or for cancer staging.
Collapse
Affiliation(s)
- Andrew T. Milnthorpe
- School of Biological Sciences, CBMS, Royal Holloway University of London, Egham, Surrey, United Kingdom
| | - Mikhail Soloviev
- School of Biological Sciences, CBMS, Royal Holloway University of London, Egham, Surrey, United Kingdom
| |
Collapse
|
12
|
Abstract
BACKGROUND Each organ has a specific function in the body. "Organ-specificity" refers to differential expressions of the same gene across different organs. An organ-specific gene/protein is defined as a gene/protein whose expression is significantly elevated in a specific human organ. An "organ-specific marker" is defined as an organ-specific gene/protein that is also implicated in human diseases related to the organ. Previous studies have shown that identifying specificity for the organ in which a gene or protein is significantly differentially expressed, can lead to discovery of its function. Most currently available resources for organ-specific genes/proteins either allow users to access tissue-specific expression over a limited range of organs, or do not contain disease information such as disease-organ relationship and disease-gene relationship. RESULTS We designed an integrated Human Organ-specific Molecular Electronic Repository (HOMER, http://bio.informatics.iupui.edu/homer), defining human organ-specific genes/proteins, based on five criteria: 1) comprehensive organ coverage; 2) gene/protein to disease association; 3) disease-organ association; 4) quantification of organ-specificity; and 5) cross-linking of multiple available data sources.HOMER is a comprehensive database covering about 22,598 proteins, 52 organs, and 4,290 diseases integrated and filtered from organ-specific proteins/genes and disease databases like dbEST, TiSGeD, HPA, CTD, and Disease Ontology. The database has a Web-based user interface that allows users to find organ-specific genes/proteins by gene, protein, organ or disease, to explore the histogram of an organ-specific gene/protein, and to identify disease-related organ-specific genes by browsing the disease data online.Moreover, the quality of the database was validated with comparison to other known databases and two case studies: 1) an association analysis of organ-specific genes with disease and 2) a gene set enrichment analysis of organ-specific gene expression data. CONCLUSIONS HOMER is a new resource for analyzing, identifying, and characterizing organ-specific molecules in association with disease-organ and disease-gene relationships. The statistical method we developed for organ-specific gene identification can be applied to other organism. The current HOMER database can successfully answer a variety of questions related to organ specificity in human diseases and can help researchers in discovering and characterizing organ-specific genes/proteins with disease relevance.
Collapse
Affiliation(s)
- Fan Zhang
- School of Informatics, Indiana University, Indianapolis, IN 46202, USA
| | | |
Collapse
|
13
|
Gremse M, Chang A, Schomburg I, Grote A, Scheer M, Ebeling C, Schomburg D. The BRENDA Tissue Ontology (BTO): the first all-integrating ontology of all organisms for enzyme sources. Nucleic Acids Res 2010; 39:D507-13. [PMID: 21030441 PMCID: PMC3013802 DOI: 10.1093/nar/gkq968] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
BTO, the BRENDA Tissue Ontology (http://www.BTO.brenda-enzymes.org) represents a comprehensive structured encyclopedia of tissue terms. The project started in 2003 to create a connection between the enzyme data collection of the BRENDA enzyme database and a structured network of source tissues and cell types. Currently, BTO contains more than 4600 different anatomical structures, tissues, cell types and cell lines, classified under generic categories corresponding to the rules and formats of the Gene Ontology Consortium and organized as a directed acyclic graph (DAG). Most of the terms are endowed with comments on their derivation or definitions. The content of the ontology is constantly curated with ∼1000 new terms each year. Four different types of relationships between the terms are implemented. A versatile web interface with several search and navigation functionalities allows convenient online access to the BTO and to the enzymes isolated from the tissues. Important areas of applications of the BTO terms are the detection of enzymes in tissues and the provision of a solid basis for text-mining approaches in this field. It is widely used by lab scientists, curators of genomic and biochemical databases and bioinformaticians. The BTO is freely available at http://www.obofoundry.org.
Collapse
Affiliation(s)
- Marion Gremse
- Technische Universität Braunschweig, Institute for Bioinformatics and Biochemistry, Langer Kamp 19 B, 38106 Braunschweig, Germany
| | | | | | | | | | | | | |
Collapse
|
14
|
Gellert P, Jenniches K, Braun T, Uchida S. C-It: a knowledge database for tissue-enriched genes. Bioinformatics 2010; 26:2328-33. [PMID: 20628071 DOI: 10.1093/bioinformatics/btq417] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Due to the development of high-throughput technologies such as microarrays, it has become possible to determine genome-wide expression changes in a single experiment. Although much attention has been paid to identify differentially expressed genes, the functions of tens of thousands of genes in different species still remain unknown. RESULTS C-It is a knowledge database that has its focus on 'uncharacterized genes'. C-It contains expression profiles of various tissues from human, mouse, rat, chicken and zebrafish. By applying our previously introduced algorithm DGSA (Database-Dependent Gene Selection and Analysis), it is possible to screen for uncharacterized, tissue-enriched genes in the species mentioned above. C-It is designed to include further expression studies, which might provide more comprehensive coverage of gene expression patterns and tissue-enriched splicing isoforms. We propose that C-It will be an excellent starting point to study uncharacterized genes. AVAILABILITY C-It is freely available online without registration at http://C-It.mpi-bn.mpg.dehttp://C-It.mpi-bn.mpg.de.
Collapse
Affiliation(s)
- Pascal Gellert
- Max-Planck-Institute for Heart and Lung Research, Bad Nauheim, Germany
| | | | | | | |
Collapse
|