1
|
Parmar G, Chudasama JM, Shah A, Aundhia C, Kardani S. Targeting cell cycle arrest in breast cancer by phytochemicals from Caryto urens L. fruit ethyl acetate fraction: in silico and in vitro validation. J Ayurveda Integr Med 2025; 16:101095. [PMID: 40081286 PMCID: PMC11932863 DOI: 10.1016/j.jaim.2024.101095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 10/25/2024] [Accepted: 10/26/2024] [Indexed: 03/15/2025] Open
Abstract
BACKGROUND Caryota urens, also known as Shivjata, has been documented in ancient Indian texts for its therapeutic benefits, addressing conditions from seminal weakness to gastric ulcers. This study aims to investigate its contemporary medicinal potential in treating breast cancer. OBJECTIVES The study focuses on exploring the therapeutic potential of Caryota urens fruit against breast cancer, specifically targeting cell cycle genes CDK1, CDC25A, and PLK1 through bioinformatics, network pharmacology, and in vitro validation. MATERIALS AND METHODS Using mass spectrometry and nuclear magnetic resonance (NMR), 60 key phytoconstituents from Caryota urens fruit were identified. Bioinformatics analysis, integrating Gene Cards and GEO databases, 15,474 breast cancer-associated genes focusing on the HR+/HER2-subtype were identified. Molecular docking and qPCR validated the interactions of key phytoconstituents, particularly Episesamin, with CDK1, CDC25A, and PLK1. In vitro studies were conducted on the MCF7 cell line, supplemented by ROC and survival analyses to evaluate diagnostic and therapeutic potential. RESULTS The bioinformatics analysis identified CDK1, CDC25A, and PLK1 as pivotal genes regulating cell cycle progression and breast cancer tumorigenesis. Network pharmacology and in vitro studies indicated that phytoconstituents, especially Episesamin, downregulated these genes in breast cancer cells. Molecular docking and qPCR confirmed these interactions, and ROC and survival analyses underscored their diagnostic and therapeutic significance. CONCLUSIONS This study suggests that Caryota urens fruit extract, particularly Episesamin, may inhibit breast cancer metastasis by downregulating CDK1, CDC25A, and PLK1, offering promising new strategies for targeting the cell cycle in breast cancer and emphasizing the value of integrating bioinformatics with experimental methods in cancer research.
Collapse
Affiliation(s)
- Ghanshyam Parmar
- Department of Pharmacy, Sumandeep Vidyapeeth Deemed to be University, Piparia, Waghodia, Vadodara, 391760, Gujarat, India.
| | - Jay Mukesh Chudasama
- Department of Pharmacy, Sumandeep Vidyapeeth Deemed to be University, Piparia, Waghodia, Vadodara, 391760, Gujarat, India
| | - Ashish Shah
- Department of Pharmacy, Sumandeep Vidyapeeth Deemed to be University, Piparia, Waghodia, Vadodara, 391760, Gujarat, India
| | - Chintan Aundhia
- Department of Pharmacy, Sumandeep Vidyapeeth Deemed to be University, Piparia, Waghodia, Vadodara, 391760, Gujarat, India
| | - Sunil Kardani
- Department of Pharmacy, Sumandeep Vidyapeeth Deemed to be University, Piparia, Waghodia, Vadodara, 391760, Gujarat, India
| |
Collapse
|
2
|
Tian L, Wang SL. Exploring the potential microRNA sponge interactions of breast cancer based on some known interactions. J Bioinform Comput Biol 2020; 18:2050007. [PMID: 32530353 DOI: 10.1142/s0219720020500079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
MicroRNA (miRNA) sponges' regulatory mechanisms play an important role in developing human cancer. Herein, we develop a new method to explore potential miRNA sponge interactions (EPMSIs) for breast cancer. Based on some known interactions, and a matching gene expression profile, EPMSIs explored other potential miRNA sponge interactions for breast cancer. Every interaction is inferred with a value representing interaction intensity. Then, we apply a clustering algorithm called BCPlaid to potential interactions. Ten modules are identified; nine of them are closely associated with biological enrichments. When we employ a classification algorithm to separate normal and tumor samples in each module, each module demonstrates powerful classification performance. Furthermore, EPMSI illustrates a new method to explore the miRNA sponge regulatory network for breast cancer by applying its superior performance.
Collapse
Affiliation(s)
- Lei Tian
- School of Information Science and Engineering, Hunan University, Changsha, China
| | - Shu-Lin Wang
- School of Information Science and Engineering, Hunan University, Changsha, China
| |
Collapse
|
3
|
Xie B, Yuan Z, Yang Y, Sun Z, Zhou S, Fang X. MOBCdb: a comprehensive database integrating multi-omics data on breast cancer for precision medicine. Breast Cancer Res Treat 2018; 169:625-632. [PMID: 29429018 DOI: 10.1007/s10549-018-4708-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 02/03/2018] [Indexed: 12/23/2022]
Abstract
BACKGROUND Breast cancer is one of the most frequently diagnosed cancers among women worldwide, characterized by diverse biological heterogeneity. It is well known that complex and combined gene regulation of multi-omics is involved in the occurrence and development of breast cancer. RESULTS In this paper, we present the Multi-Omics Breast Cancer Database (MOBCdb), a simple and easily accessible repository that integrates genomic, transcriptomic, epigenomic, clinical, and drug response data of different subtypes of breast cancer. MOBCdb allows users to retrieve simple nucleotide variation (SNV), gene expression, microRNA expression, DNA methylation, and specific drug response data by various search fashions. The genome-wide browser /navigation facility in MOBCdb provides an interface for visualizing multi-omics data of multi-samples simultaneously. Furthermore, the survival module provides survival analysis for all or some of the samples by using data of three omics. The approved public drugs with genetic variations on breast cancer are also included in MOBCdb. CONCLUSION In summary, MOBCdb provides users a unique web interface to the integrated multi-omics data of different subtypes of breast cancer, which enables the users to identify potential novel biomarkers for precision medicine.
Collapse
Affiliation(s)
- Bingbing Xie
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zifeng Yuan
- Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, 200433, China
| | - Yadong Yang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhidan Sun
- Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, 200433, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, 200433, China.
| | - Xiangdong Fang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
4
|
Zhang J, Le TD, Liu L, Li J. Inferring miRNA sponge co-regulation of protein-protein interactions in human breast cancer. BMC Bioinformatics 2017; 18:243. [PMID: 28482794 PMCID: PMC5423010 DOI: 10.1186/s12859-017-1672-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 05/03/2017] [Indexed: 12/14/2022] Open
Abstract
Background Recent studies have shown that the crosstalk between microRNA (miRNA) sponges plays an important role in human cancers. However, the co-regulation roles of miRNA sponges in protein-protein interactions (PPIs) are still unknown. Results In this study, we propose a multi-step method called miRSCoPPI to infer miRNA sponge co-regulation of PPIs. We focus on investigating breast cancer (BRCA) related miRNA sponge co-regulation, by integrating heterogeneous data, including miRNA, long non-coding RNA (lncRNA) and messenger RNA (mRNA) expression data, experimentally validated miRNA-target interactions, PPIs and lncRNA-target interactions, and the list of breast cancer genes. We find that the inferred BRCA-related miRSCoPPI network is highly connected and scale free. The top 10% hub genes in the BRCA-related miRSCoPPI network have potential biological implications in breast cancer. By utilizing a graph clustering method, we discover 17 BRCA-related miRSCoPPI modules. Through pathway enrichment analysis of the modules, we find that several modules are significantly enriched in pathways associated with breast cancer. Moreover, 10 modules have good performance in classifying breast tumor and normal samples, and can act as module signatures for prognostication. By using putative computationally predicted miRNA-target interactions, we have consistent results with those obtained using experimentally validated miRNA-target interactions, indicating that miRSCoPPI is robust in inferring miRNA sponge co-regulation of PPIs in human breast cancer. Conclusions Taken together, the results demonstrate that miRSCoPPI is a promising tool for inferring BRCA-related miRNA sponge co-regulation of PPIs and it can help with the understanding of the co-regulation roles of miRNA sponges on the PPIs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1672-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Junpeng Zhang
- School of Engineering, Dali University, Dali, Yunnan, 671003, People's Republic of China.
| | - Thuc Duy Le
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA, 5095, Australia.,Centre for Cancer Biology, University of South Australia, Adelaide, SA, 5000, Australia
| | - Lin Liu
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA, 5095, Australia.
| |
Collapse
|
5
|
Abstract
Background MicroRNA (miRNA) sponges with multiple tandem miRNA binding sequences can sequester miRNAs from their endogenous target mRNAs. Therefore, miRNA sponge acting as a decoy is extremely important for long-term loss-of-function studies both in vivo and in silico. Recently, a growing number of in silico methods have been used as an effective technique to generate hypotheses for in vivo methods for studying the biological functions and regulatory mechanisms of miRNA sponges. However, most existing in silico methods only focus on studying miRNA sponge interactions or networks in cancer, the module-level properties of miRNA sponges in cancer is still largely unknown. Results We propose a novel in silico method, called miRSM (miRNA Sponge Module) to infer miRNA sponge modules in breast cancer. We apply miRSM to the breast invasive carcinoma (BRCA) dataset provided by The Cancer Genome Altas (TCGA), and make functional validation of the computational results. We discover that most miRNA sponge interactions are module-conserved across two modules, and a minority of miRNA sponge interactions are module-specific, existing only in a single module. Through functional annotation and differential expression analysis, we also find that the modules discovered using miRSM are functional miRNA sponge modules associated with BRCA. Moreover, the module-specific miRNA sponge interactions among miRNA sponge modules may be involved in the progression and development of BRCA. Our experimental results show that miRSM is comparable to the benchmark methods in recovering experimentally confirmed miRNA sponge interactions, and miRSM outperforms the benchmark methods in identifying interactions that are related to breast cancer. Conclusions Altogether, the functional validation results demonstrate that miRSM is a promising method to identify miRNA sponge modules and interactions, and may provide new insights for understanding the roles of miRNA sponges in cancer progression and development. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1467-5) contains supplementary material, which is available to authorized users.
Collapse
|
6
|
Mok SRS, Mohan S, Grewal N, Elfant AB, Judge TA. A genetic database can be utilized to identify potential biomarkers for biphenotypic hepatocellular carcinoma-cholangiocarcinoma. J Gastrointest Oncol 2016; 7:570-9. [PMID: 27563447 DOI: 10.21037/jgo.2016.04.01] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Biphenotypic hepatocellular carcinoma-cholangiocarcinoma (HCC-CC) is an uncommon primary liver neoplasm. Due to limitations in radiologic imaging for the diagnosis of this condition, biopsy is a common method for diagnosis, which is invasive and holds potential complications. To identify alternative means for obtaining the diagnosis and assessing the prognosis of this condition, we evaluated biomarkers for biphenotypic HCC-CC using a genetic database. METHODS To evaluate the genetic associations with each variable we utilized GeneCards(®), The Human Gene Compendium (http://www.genecards.org). The results of our search were entered into the Pathway Interaction Database from the National Cancer Institute (PID-NCI) (http://pid.nci.nih.gov), to generate a biomolecule interaction map. RESULTS The results of our query yielded 690 genes for HCC, 98 genes for CC and 50 genes for HCC-CC. Genes depicted in this analysis demonstrate the role of hormonal regulation, embryonic development, cell surface adhesion, cytokeratin stability, mucin production, metalloproteinase regulation, Ras signaling, metabolism and apoptosis. Examples of previously described markers included hepatocyte growth factor (HGF), mesenchymal epithelial transition (MET) and Kirsten rat sarcoma viral oncogene homolog (KRAS). Novel markers included phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha (PIK3CA), GPC3, choline kinase alpha (CHKA), prostaglandin-endoperoxide synthase 2 (PTGS2), telomerase reverse transcriptase (TERT), myeloid cell leukemia 1 (MCL1) and N-acetyltransferase 2 (NAT2). CONCLUSIONS GeneCards is a useful research tool in the genetic analysis of low frequency malignancies. Utilizing this tool we identified several biomarkers are methods for diagnosing HCC-CC. Finally, utilizing these methods, HCC-CC was found to be predominantly a subtype of CC.
Collapse
Affiliation(s)
- Shaffer R S Mok
- Division of Gastroenterology and Liver Diseases, Department of Medicine, Cooper Medical School of Rowan University, MD Anderson Cancer Center at Cooper, Mount Laurel, NJ, USA
| | - Sachin Mohan
- Division of Gastroenterology and Liver Diseases, Department of Medicine, Cooper Medical School of Rowan University, MD Anderson Cancer Center at Cooper, Mount Laurel, NJ, USA
| | - Navjot Grewal
- Division of Gastroenterology and Liver Diseases, Department of Medicine, Cooper Medical School of Rowan University, MD Anderson Cancer Center at Cooper, Mount Laurel, NJ, USA
| | - Adam B Elfant
- Division of Gastroenterology and Liver Diseases, Department of Medicine, Cooper Medical School of Rowan University, MD Anderson Cancer Center at Cooper, Mount Laurel, NJ, USA
| | - Thomas A Judge
- Division of Gastroenterology and Liver Diseases, Department of Medicine, Cooper Medical School of Rowan University, MD Anderson Cancer Center at Cooper, Mount Laurel, NJ, USA
| |
Collapse
|
7
|
Le TD, Zhang J, Liu L, Li J. Computational methods for identifying miRNA sponge interactions. Brief Bioinform 2016; 18:577-590. [DOI: 10.1093/bib/bbw042] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Indexed: 12/14/2022] Open
|
8
|
Zhang J, Le TD, Liu L, He J, Li J. A novel framework for inferring condition-specific TF and miRNA co-regulation of protein-protein interactions. Gene 2015; 577:55-64. [PMID: 26611531 DOI: 10.1016/j.gene.2015.11.023] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Revised: 10/16/2015] [Accepted: 11/17/2015] [Indexed: 12/11/2022]
Abstract
Recent studies have shown that transcription factors (TFs) and microRNAs (miRNAs), while independently regulate their downstream targets, collaborate with each other to regulate gene expression. However, their synergistic roles in protein-protein interactions (PPIs) remain mostly unknown. In this paper, we present a novel framework (called CoRePPI) for inferring TF and miRNA co-regulation of PPIs. Particularly, CoRePPI is aimed at discovering the co-regulation specific to a condition of interest, by using heterogeneous data, including miRNA and messenger RNA (mRNA) expression profiles, putative miRNA targets, TF targets and PPIs. CoRePPI firstly finds the network motifs indicating the co-regulation of PPIs by TFs and miRNAs in tumor and normal conditions separately. Then by identifying the differential motifs found in one condition but not in the other, it builds the networks consisting of TFs, miRNAs and their co-regulated PPIs specific to different conditions respectively. To validate CoRePPI, we apply it to the Pan-Cancer dataset which includes the expression profiles of 12 cancer types from TCGA. Through network topology analysis, we found that the tumor and normal CoRePPI networks are scale-free. Furthermore, the results of differential and intersected network analysis between the tumor and normal CoRePPI networks suggest that only a small fraction of the regulatory relationships between TFs and miRNAs are conserved in both conditions but they co-regulate different downstream PPIs in tumor and normal conditions; and in different conditions the majority of the regulatory relationships between TFs and miRNAs are different although they may regulate the same PPIs in their respective conditions. The CoRePPI sub-networks constructed for the three types of cancers (breast cancer, lung cancer and ovarian cancer) are all scale-free, and the intersection of these CoRePPI sub-networks can be utilized as the biomarker CoRePPI sub-network of the three types of cancers. The PPI enrichment analyses of the tumor and normal CoRePPI networks suggest that the co-regulating TFs and miRNAs are significantly associated with the specific biological processes, diseases and pathways. In addition, comparing with the two non-condition-specific approaches, the tumor CoRePPI network is found to have the most enriched cancer-related PPIs. Altogether, the results uncover the combined regulatory patterns of TFs and miRNAs on the PPIs, and may provide new insights for research in cancer-associated TFs and miRNAs.
Collapse
Affiliation(s)
- Junpeng Zhang
- School of Engineering, Dali University, Dali, Yunnan 671003, China.
| | - Thuc Duy Le
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA 5095, Australia
| | - Lin Liu
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA 5095, Australia
| | - Jianfeng He
- School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, University of South Australia, Mawson Lakes, SA 5095, Australia.
| |
Collapse
|
9
|
Zhang D, Zhu R, Zhang H, Zheng CH, Xia J. MGDB: a comprehensive database of genes involved in melanoma. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav097. [PMID: 26424083 PMCID: PMC4589692 DOI: 10.1093/database/bav097] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 09/07/2015] [Indexed: 12/14/2022]
Abstract
The Melanoma Gene Database (MGDB) is a manually curated catalog of molecular genetic data relating to genes involved in melanoma. The main purpose of this database is to establish a network of melanoma related genes and to facilitate the mechanistic study of melanoma tumorigenesis. The entries describing the relationships between melanoma and genes in the current release were manually extracted from PubMed abstracts, which contains cumulative to date 527 human melanoma genes (422 protein-coding and 105 non-coding genes). Each melanoma gene was annotated in seven different aspects (General Information, Expression, Methylation, Mutation, Interaction, Pathway and Drug). In addition, manually curated literature references have also been provided to support the inclusion of the gene in MGDB and establish its association with melanoma. MGDB has a user-friendly web interface with multiple browse and search functions. We hoped MGDB will enrich our knowledge about melanoma genetics and serve as a useful complement to the existing public resources. Database URL:http://bioinfo.ahu.edu.cn:8080/Melanoma/index.jsp
Collapse
Affiliation(s)
- Di Zhang
- Institute of Health Sciences, School of Computer Science and Technology
| | - Rongrong Zhu
- Institute of Health Sciences, School of Computer Science and Technology
| | - Hanqian Zhang
- Institute of Health Sciences, School of Computer Science and Technology
| | - Chun-Hou Zheng
- College of Electrical Engineering and Automation and Center of Information Support and Assurance Technology, Anhui University, Hefei, Anhui 230601, China
| | - Junfeng Xia
- Institute of Health Sciences, School of Computer Science and Technology, Center of Information Support and Assurance Technology, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
10
|
Liu Y, Liang Y, Wishart D. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more. Nucleic Acids Res 2015; 43:W535-42. [PMID: 25925572 PMCID: PMC4489268 DOI: 10.1093/nar/gkv383] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2015] [Accepted: 04/11/2015] [Indexed: 02/01/2023] Open
Abstract
PolySearch2 (http://polysearch.ca) is an online text-mining system for identifying relationships between biomedical entities such as human diseases, genes, SNPs, proteins, drugs, metabolites, toxins, metabolic pathways, organs, tissues, subcellular organelles, positive health effects, negative health effects, drug actions, Gene Ontology terms, MeSH terms, ICD-10 medical codes, biological taxonomies and chemical taxonomies. PolySearch2 supports a generalized ‘Given X, find all associated Ys’ query, where X and Y can be selected from the aforementioned biomedical entities. An example query might be: ‘Find all diseases associated with Bisphenol A’. To find its answers, PolySearch2 searches for associations against comprehensive collections of free-text collections, including local versions of MEDLINE abstracts, PubMed Central full-text articles, Wikipedia full-text articles and US Patent application abstracts. PolySearch2 also searches 14 widely used, text-rich biological databases such as UniProt, DrugBank and Human Metabolome Database to improve its accuracy and coverage. PolySearch2 maintains an extensive thesaurus of biological terms and exploits the latest search engine technology to rapidly retrieve relevant articles and databases records. PolySearch2 also generates, ranks and annotates associative candidates and present results with relevancy statistics and highlighted key sentences to facilitate user interpretation.
Collapse
Affiliation(s)
- Yifeng Liu
- Department of Computing Science, University of Alberta, Edmonton, T6G 2E8 Canada
| | - Yongjie Liang
- Department of Computing Science, University of Alberta, Edmonton, T6G 2E8 Canada
| | - David Wishart
- Department of Computing Science, University of Alberta, Edmonton, T6G 2E8 Canada Department of Biological Science, University of Alberta, Edmonton, T6G 2E6 Canada
| |
Collapse
|
11
|
Pavlopoulou A, Spandidos DA, Michalopoulos I. Human cancer databases (review). Oncol Rep 2014; 33:3-18. [PMID: 25369839 PMCID: PMC4254674 DOI: 10.3892/or.2014.3579] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 10/31/2014] [Indexed: 12/20/2022] Open
Abstract
Cancer is one of the four major non‑communicable diseases (NCD), responsible for ~14.6% of all human deaths. Currently, there are >100 different known types of cancer and >500 genes involved in cancer. Ongoing research efforts have been focused on cancer etiology and therapy. As a result, there is an exponential growth of cancer‑associated data from diverse resources, such as scientific publications, genome‑wide association studies, gene expression experiments, gene‑gene or protein‑protein interaction data, enzymatic assays, epigenomics, immunomics and cytogenetics, stored in relevant repositories. These data are complex and heterogeneous, ranging from unprocessed, unstructured data in the form of raw sequences and polymorphisms to well‑annotated, structured data. Consequently, the storage, mining, retrieval and analysis of these data in an efficient and meaningful manner pose a major challenge to biomedical investigators. In the current review, we present the central, publicly accessible databases that contain data pertinent to cancer, the resources available for delivering and analyzing information from these databases, as well as databases dedicated to specific types of cancer. Examples for this wealth of cancer‑related information and bioinformatic tools have also been provided.
Collapse
Affiliation(s)
- Athanasia Pavlopoulou
- Center of Systems Biology, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| | - Demetrios A Spandidos
- Laboratory of Clinical Virology, Medical School, University of Crete, Heraklion 71003, Crete, Greece
| | - Ioannis Michalopoulos
- Center of Systems Biology, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| |
Collapse
|
12
|
Srivastava A, Kumar S, Ramaswamy R. Two-layer modular analysis of gene and protein networks in breast cancer. BMC SYSTEMS BIOLOGY 2014; 8:81. [PMID: 24997799 PMCID: PMC4105126 DOI: 10.1186/1752-0509-8-81] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 06/26/2014] [Indexed: 02/05/2023]
Abstract
Background Genomic, proteomic and high-throughput gene expression data, when integrated, can be used to map the interaction networks between genes and proteins. Different approaches have been used to analyze these networks, especially in cancer, where mutations in biologically related genes that encode mutually interacting proteins are believed to be involved. This system of integrated networks as a whole exhibits emergent biological properties that are not obvious at the individual network level. We analyze the system in terms of modules, namely a set of densely interconnected nodes that can be further divided into submodules that are expected to participate in multiple biological activities in coordinated manner. Results In the present work we construct two layers of the breast cancer network: the gene layer, where the correlation network of breast cancer genes is analyzed to identify gene modules, and the protein layer, where each gene module is extended to map out the network of expressed proteins and their interactions in order to identify submodules. Each module and its associated submodules are analyzed to test the robustness of their topological distribution. The constituent biological phenomena are explored through the use of the Gene Ontology. We thus construct a “network of networks”, and demonstrate that both the gene and protein interaction networks are modular in nature. By focusing on the ontological classification, we are able to determine the entire GO profiles that are distributed at different levels of hierarchy. Within each submodule most of the proteins are biologically correlated, and participate in groups of distinct biological activities. Conclusions The present approach is an effective method for discovering coherent gene modules and protein submodules. We show that this also provides a means of determining biological pathways (both novel and as well those that have been reported previously) that are related, in the present instance, to breast cancer. Similar strategies are likely to be useful in the analysis of other diseases as well.
Collapse
Affiliation(s)
- Alok Srivastava
- C R RAO Advanced Institute of Mathematics, Statistics and Computer Science, University of Hyderabad Campus, Hyderabad 500046, India.
| | | | | |
Collapse
|
13
|
Text mining of cancer-related information: review of current status and future directions. Int J Med Inform 2014; 83:605-23. [PMID: 25008281 DOI: 10.1016/j.ijmedinf.2014.06.009] [Citation(s) in RCA: 119] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Revised: 06/12/2014] [Accepted: 06/14/2014] [Indexed: 12/21/2022]
Abstract
PURPOSE This paper reviews the research literature on text mining (TM) with the aim to find out (1) which cancer domains have been the subject of TM efforts, (2) which knowledge resources can support TM of cancer-related information and (3) to what extent systems that rely on knowledge and computational methods can convert text data into useful clinical information. These questions were used to determine the current state of the art in this particular strand of TM and suggest future directions in TM development to support cancer research. METHODS A review of the research on TM of cancer-related information was carried out. A literature search was conducted on the Medline database as well as IEEE Xplore and ACM digital libraries to address the interdisciplinary nature of such research. The search results were supplemented with the literature identified through Google Scholar. RESULTS A range of studies have proven the feasibility of TM for extracting structured information from clinical narratives such as those found in pathology or radiology reports. In this article, we provide a critical overview of the current state of the art for TM related to cancer. The review highlighted a strong bias towards symbolic methods, e.g. named entity recognition (NER) based on dictionary lookup and information extraction (IE) relying on pattern matching. The F-measure of NER ranges between 80% and 90%, while that of IE for simple tasks is in the high 90s. To further improve the performance, TM approaches need to deal effectively with idiosyncrasies of the clinical sublanguage such as non-standard abbreviations as well as a high degree of spelling and grammatical errors. This requires a shift from rule-based methods to machine learning following the success of similar trends in biological applications of TM. Machine learning approaches require large training datasets, but clinical narratives are not readily available for TM research due to privacy and confidentiality concerns. This issue remains the main bottleneck for progress in this area. In addition, there is a need for a comprehensive cancer ontology that would enable semantic representation of textual information found in narrative reports.
Collapse
|
14
|
Shashni B, Sakharkar KR, Nagasaki Y, Sakharkar MK. Glycolytic enzymes PGK1 and PKM2 as novel transcriptional targets of PPARγ in breast cancer pathophysiology. J Drug Target 2013; 21:161-74. [PMID: 23130662 DOI: 10.3109/1061186x.2012.736998] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Peroxisome proliferator-activated receptor γ (PPARγ) is a nuclear receptor and plays important roles in breast cancer cell proliferation. The complexity of the underlying biochemical and molecular mechanisms of breast cancer and the involvement of PPARγ in breast cancer pathophysiology are unclear. In this study, we carried out prediction of the peroxisome proliferator response element (PPRE) motifs in 2332 genes reported to be involved in breast cancer in literature. A total of 178 genes were found to have PPRE (DR1/DR2) and/or PPAR-associated conserved motif (PACM) motifs. We further constructed protein-protein interaction network, disease gene network and gene ontology (GO) analyses to identify novel key genes for experimental validation. We identified two genes in the glycolytic pathway (phosphoglycerate kinase 1 (PGK1) and pyruvate kinase M2 (PKM2)) at the ATP production steps and experimentally validated their repression by PPARγ in two breast cancer cell lines MDA-MB-231 and MCF-7. Further analysis suggested that this repression leads to decrease in ATP levels and apoptosis. These investigations will help us in understanding the molecular mechanisms by which PPARγ regulates the cellular energy pathway and the use of its ligands in human breast cancer therapeutics.
Collapse
Affiliation(s)
- Babita Shashni
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | | | | | | |
Collapse
|
15
|
Chand Y, Alam MA. Network biology approach for identifying key regulatory genes by expression based study of breast cancer. Bioinformation 2012; 8:1132-8. [PMID: 23275709 PMCID: PMC3530881 DOI: 10.6026/97320630081132] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2012] [Accepted: 11/03/2012] [Indexed: 11/23/2022] Open
Abstract
The use of high-throughput array technology is omnipresent in diverse areas specifically, early diagnosis of disease, discovery of infectious agents, search for biological markers and screening of potential drug candidates. Here, we integrated gene expression data with the network-based approach to identify novel genes that were playing central role in the network through interconnecting to a number of differentially expressed breast cancer genes. The 62 cancerous genes retrieved from the Breast Cancer Gene Database (BCGD) were mapped in the normalized data accessed from Stanford Microarray Database (SMD) to analyze their pattern. Interaction networks for each gene were constructed to understand the biology of the metastasis at systems level. The individual networks were fused together for the detection of interacting hubs, 38 novel genes were found to be deeply intermingled with the central hub node. Gene Ontology studies were made to depict the biology of the hub nodes not alone through gene ranking but by applying the Hyper geometric test with the Benjamini Hochberg False Discovery Rate (FDR) correction method at a significance level of 0.05. Analyzing p-values from the statistical test indicated that most of the novel genes were involved in the same biological function as the disordered genes like signal transducer, transcription regulator, enzyme binding, molecular transducer and receptor signaling protein activity and same pathway as MAPK signaling, Apoptosis, Wnt Signaling, ErbB signaling and Cell Cycle. Lastly, we identified 3 novel genes CHUK, INSR and CREBBP showing high connections with the 12 novel genes reported in literatures as well with the perturbed genes. As a result, these genes can be considered as significant finding in revealing the basis and pathways responsible for breast cancer.
Collapse
Affiliation(s)
- Yamini Chand
- Department of Bioinformatics, Karunya University, Coimbatore, India
| | - Md Afroz Alam
- Department of Bioinformatics, Karunya University, Coimbatore, India
| |
Collapse
|
16
|
Abstract
BACKGROUND Renal cell carcinoma or RCC is one of the common and most lethal urological cancers, with 40% of the patients succumbing to death because of metastatic progression of the disease. Treatment of metastatic RCC remains highly challenging because of its resistance to chemotherapy as well as radiotherapy, besides surgical resection. Whereas RCC comprises tumors with differing histological types, clear cell RCC remains the most common. A major problem in the clinical management of patients presenting with localized ccRCC is the inability to determine tumor aggressiveness and accurately predict the risk of metastasis following surgery. As a measure to improve the diagnosis and prognosis of RCC, researchers have identified several molecular markers through a number of techniques. However the wealth of information available is scattered in literature and not easily amenable to data-mining. To reduce this gap, this work describes a comprehensive repository called Renal Cancer Gene Database, as an integrated gateway to study renal cancer related data. FINDINGS Renal Cancer Gene Database is a manually curated compendium of 240 protein-coding and 269 miRNA genes contributing to the etiology and pathogenesis of various forms of renal cell carcinomas. The protein coding genes have been classified according to the kind of gene alteration observed in RCC. RCDB also includes the miRNAsdysregulated in RCC, along with the corresponding information regarding the type of RCC and/or metastatic or prognostic significance. While some of the miRNA genes showed an association with other types of cancers few were unique to RCC. Users can query the database using keywords, category and chromosomal location of the genes. The knowledgebase can be freely accessed via a user-friendly web interface at http://www.juit.ac.in/attachments/jsr/rcdb/homenew.html. CONCLUSIONS It is hoped that this database would serve as a useful complement to the existing public resources and as a good starting point for researchers and physicians interested in RCC genetics.
Collapse
Affiliation(s)
- Jayashree Ramana
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, 173234, Waknaghat, Solan, Himachal Pradesh, India.
| |
Collapse
|
17
|
Seyhan AA, Varadarajan U, Choe S, Liu W, Ryan TE. A genome-wide RNAi screen identifies novel targets of neratinib resistance leading to identification of potential drug resistant genetic markers. MOLECULAR BIOSYSTEMS 2012; 8:1553-70. [PMID: 22446932 DOI: 10.1039/c2mb05512k] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Neratinib (HKI-272) is a small molecule tyrosine kinase inhibitor of the ErbB receptor family currently in Phase III clinical trials. Despite its efficacy, the mechanism of potential cellular resistance to neratinib and genes involved with it remains unknown. We have used a pool-based lentiviral genome-wide functional RNAi screen combined with a lethal dose of neratinib to discover chemoresistant interactions with neratinib. Our screen has identified a collection of genes whose inhibition by RNAi led to neratinib resistance including genes involved in oncogenesis (e.g. RAB33A, RAB6A and BCL2L14), transcription factors (e.g. FOXP4, TFEC, ZNF), cellular ion transport (e.g. CLIC3, TRAPPC2P1, P2RX2), protein ubiquitination (e.g. UBL5), cell cycle (e.g. CCNF), and genes known to interact with breast cancer-associated genes (e.g. CCNF, FOXP4, TFEC, several ZNF factors, GNA13, IGFBP1, PMEPA1, SOX5, RAB33A, RAB6A, FXR1, DDO, TFEC, OLFM2). The identification of novel mediators of cellular resistance to neratinib could lead to the identification of new or neoadjuvant drug targets. Their use as patient or treatment selection biomarkers could make the application of anti-ErbB therapeutics more clinically effective.
Collapse
Affiliation(s)
- Attila A Seyhan
- Systems Biology, Global Biotherapeutics, Pfizer Inc., 200 Cambridgepark Drive, Cambridge, MA 02140, USA.
| | | | | | | | | |
Collapse
|
18
|
Lahti L, Schäfer M, Klein HU, Bicciato S, Dugas M. Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review. Brief Bioinform 2012; 14:27-35. [PMID: 22441573 PMCID: PMC3548603 DOI: 10.1093/bib/bbs005] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
A variety of genome-wide profiling techniques are available to investigate complementary aspects of genome structure and function. Integrative analysis of heterogeneous data sources can reveal higher level interactions that cannot be detected based on individual observations. A standard integration task in cancer studies is to identify altered genomic regions that induce changes in the expression of the associated genes based on joint analysis of genome-wide gene expression and copy number profiling measurements. In this review, we highlight common approaches to genomic data integration and provide a transparent benchmarking procedure to quantitatively compare method performances in cancer gene prioritization. Algorithms, data sets and benchmarking results are available at http://intcomp.r-forge.r-project.org.
Collapse
Affiliation(s)
- Leo Lahti
- Wageningen University, Laboratory of Microbiology, 6703HB Wageningen, Netherlands.
| | | | | | | | | |
Collapse
|
19
|
Kibriya MG, Jasmine F, Roy S, Paul-Brutus RM, Argos M, Ahsan H. Analyses and interpretation of whole-genome gene expression from formalin-fixed paraffin-embedded tissue: an illustration with breast cancer tissues. BMC Genomics 2010; 11:622. [PMID: 21059268 PMCID: PMC3091761 DOI: 10.1186/1471-2164-11-622] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2010] [Accepted: 11/08/2010] [Indexed: 12/03/2022] Open
Abstract
Background We evaluated (a) the feasibility of whole genome cDNA-mediated Annealing, Selection, extension and Ligation (DASL) assay on formalin-fixed paraffin-embedded (FFPE) tissue and (b) whether similar conclusions can be drawn by examining FFPE samples as proxies for fresh frozen (FF) tissues. We used a whole genome DASL assay (addressing 18,391 genes) on a total of 72 samples from paired breast tumor and surrounding healthy tissues from both FF and FFPE samples. Results Gene detection was very good with comparable success between the FFPE and FF samples. Reproducibility was also high (r2 = 0.98); however, concordance between the two types of samples was low. Only one-third of the differentially expressed genes in tumor tissues (compared to corresponding normal) from FF samples could be detected in FFPE samples and conversely only one-fourth of the differentially expressed genes from FFPE samples could be detected in FF samples. GO-enrichment analysis, gene set enrichment analysis (GSEA) and GO-ANOVA analyses also suggested small overlap between the lead functional groups that were differentially expressed in tumor detectable by examining FFPE and FF samples. In other words, FFPE samples may not be ideal for picking individual target gene(s), but may be used to identify some of the lead functional group(s) of genes that are differentially expressed in tumor. The differentially expressed genes in breast cancer found in our study were biologically meaningful. The "cell cycle" & "cell division" related genes were up-regulated and genes related to "regulation of epithelial cell proliferation" were down-regulated. Conclusions Gene expression experiments using the DASL assay can efficiently handle fragmentation issues in the FFPE tissues. However, formalin fixation seems to change RNA and consequently significantly alters gene expression in a number of genes which may not be uniform between tumor and normal tissues. Therefore, considerable caution needs to be taken when interpreting gene expression data from FFPE tissues, especially in relation to specific genes.
Collapse
Affiliation(s)
- Muhammad G Kibriya
- Department of Health Studies, The University of Chicago, 5841 S, Maryland Avenue, MC 2007, Chicago, IL 60637, USA
| | | | | | | | | | | |
Collapse
|
20
|
Agarwal SM, Raghav D, Singh H, Raghava GPS. CCDB: a curated database of genes involved in cervix cancer. Nucleic Acids Res 2010; 39:D975-9. [PMID: 21045064 PMCID: PMC3013652 DOI: 10.1093/nar/gkq1024] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Cervical Cancer gene DataBase (CCDB, http://crdd.osdd.net/raghava/ccdb) is a manually curated catalog of experimentally validated genes that are thought, or are known to be involved in the different stages of cervical carcinogenesis. In spite of the large women population that is presently affected from this malignancy still at present, no database exists that catalogs information on genes associated with cervical cancer. Therefore, we have compiled 537 genes in CCDB that are linked with cervical cancer causation processes such as methylation, gene amplification, mutation, polymorphism and change in expression level, as evident from published literature. Each record contains details related to gene like architecture (exon–intron structure), location, function, sequences (mRNA/CDS/protein), ontology, interacting partners, homology to other eukaryotic genomes, structure and links to other public databases, thus augmenting CCDB with external data. Also, manually curated literature references have been provided to support the inclusion of the gene in the database and establish its association with cervix cancer. In addition, CCDB provides information on microRNA altered in cervical cancer as well as search facility for querying, several browse options and an online tool for sequence similarity search, thereby providing researchers with easy access to the latest information on genes involved in cervix cancer.
Collapse
Affiliation(s)
- Subhash M Agarwal
- Bioinformatics Division, Institute of Cytology and Preventive Oncology, I-7, Sector 39, Noida 201301, India.
| | | | | | | |
Collapse
|
21
|
Gong X, Wu R, Zhang Y, Zhao W, Cheng L, Gu Y, Zhang L, Wang J, Zhu J, Guo Z. Extracting consistent knowledge from highly inconsistent cancer gene data sources. BMC Bioinformatics 2010; 11:76. [PMID: 20137077 PMCID: PMC2832783 DOI: 10.1186/1471-2105-11-76] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 02/05/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Hundreds of genes that are causally implicated in oncogenesis have been found and collected in various databases. For efficient application of these abundant but diverse data sources, it is of fundamental importance to evaluate their consistency. RESULTS First, we showed that the lists of cancer genes from some major data sources were highly inconsistent in terms of overlapping genes. In particular, most cancer genes accumulated in previous small-scale studies could not be rediscovered in current high-throughput genome screening studies. Then, based on a metric proposed in this study, we showed that most cancer gene lists from different data sources were highly functionally consistent. Finally, we extracted functionally consistent cancer genes from various data sources and collected them in our database F-Census. CONCLUSIONS Although they have very low gene overlapping, most cancer gene data sources are highly consistent at the functional level, which indicates that they can separately capture partial genes in a few key pathways associated with cancer. Our results suggest that the sample sizes currently used for cancer studies might be inadequate for consistently capturing individual cancer genes, but could be sufficient for finding a number of cancer genes that could represent functionally most cancer genes. The F-Census database provides biologists with a useful tool for browsing and extracting functionally consistent cancer genes from various data sources.
Collapse
Affiliation(s)
- Xue Gong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Finding disease-specific coordinated functions by multi-function genes: insight into the coordination mechanisms in diseases. Genomics 2009; 94:94-100. [PMID: 19427897 DOI: 10.1016/j.ygeno.2009.05.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2009] [Accepted: 05/04/2009] [Indexed: 12/31/2022]
Abstract
We developed an approach using multi-function disease genes to find function pairs whose co-deregulation might induce a disease. Analyzing cancer genes, we found many cancer-specific coordinated function pairs co-deregulated by dysfunction of multi-function genes and other molecular changes in cancer. Studying two subtypes of cardiomyopathy, we found they show certain consistency at the functional coordination level. Our approach can also provide important information for finding novel disease genes as well as their mechanisms in diseases.
Collapse
|
23
|
Network Properties for Ranking Predicted miRNA Targets in Breast Cancer. Adv Bioinformatics 2009:182689. [PMID: 20224638 PMCID: PMC2833297 DOI: 10.1155/2009/182689] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2009] [Accepted: 11/15/2009] [Indexed: 11/22/2022] Open
Abstract
MicroRNAs control the expression of their target genes by translational repression and transcriptional cleavage. They are involved in various biological processes including development and progression of cancer. To uncover the biological role of miRNAs it is important to identify their target genes. The small number of experimentally validated target genes makes computer prediction methods very important. However, state-of-the-art prediction tools result in a great number of putative targets with an unpredictable number of false positives. In this paper, we propose and evaluate two approaches for ranking the biological relevance of putative targets of miRNAs which are associated with breast cancer.
Collapse
|
24
|
Cheng D, Knox C, Young N, Stothard P, Damaraju S, Wishart DS. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res 2008; 36:W399-405. [PMID: 18487273 PMCID: PMC2447794 DOI: 10.1093/nar/gkn296] [Citation(s) in RCA: 160] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
A particular challenge in biomedical text mining is to find ways of handling ‘comprehensive’ or ‘associative’ queries such as ‘Find all genes associated with breast cancer’. Given that many queries in genomics, proteomics or metabolomics involve these kind of comprehensive searches we believe that a web-based tool that could support these searches would be quite useful. In response to this need, we have developed the PolySearch web server. PolySearch supports >50 different classes of queries against nearly a dozen different types of text, scientific abstract or bioinformatic databases. The typical query supported by PolySearch is ‘Given X, find all Y's’ where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs and metabolites. PolySearch also exploits a variety of techniques in text mining and information retrieval to identify, highlight and rank informative abstracts, paragraphs or sentences. PolySearch's performance has been assessed in tasks such as gene synonym identification, protein–protein interaction identification and disease gene identification using a variety of manually assembled ‘gold standard’ text corpuses. Its f-measure on these tasks is 88, 81 and 79%, respectively. These values are between 5 and 50% better than other published tools. The server is freely available at http://wishart.biology.ualberta.ca/polysearch
Collapse
Affiliation(s)
- Dean Cheng
- Department of Computing Science, University of Alberta, Canada
| | | | | | | | | | | |
Collapse
|
25
|
Masseroli M. Management and analysis of genomic functional and phenotypic controlled annotations to support biomedical investigation and practice. ACTA ACUST UNITED AC 2007; 11:376-85. [PMID: 17674620 DOI: 10.1109/titb.2006.884367] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The growing available genomic information provides new opportunities for novel research approaches and original biomedical applications that can provide effective data management and analysis support. In fact, integration and comprehensive evaluation of available controlled data can highlight information patterns leading to unveil new biomedical knowledge. Here, we describe Genome Function INtegrated Discover (GFINDer), a Web-accessible three-tier multidatabase system we developed to automatically enrich lists of user-classified genes with several functional and phenotypic controlled annotations, and to statistically evaluate them in order to identify annotation categories significantly over- or underrepresented in each considered gene class. Genomic controlled annotations from Gene Ontology (GO), KEGG, Pfam, InterPro, and Online Mendelian Inheritance in Man (OMIM) were integrated in GFINDer and several categorical tests were implemented for their analysis. A controlled vocabulary of inherited disorder phenotypes was obtained by normalizing and hierarchically structuring disease accompanying signs and symptoms from OMIM Clinical Synopsis sections. GFINDer modular architecture is well suited for further system expansion and for sustaining increasing workload. Testing results showed that GFINDer analyses can highlight gene functional and phenotypic characteristics and differences, demonstrating its value in supporting genomic biomedical approaches aiming at understanding the complex biomolecular mechanisms underlying patho-physiological phenotypes, and in helping the transfer of genomic results to medical practice.
Collapse
Affiliation(s)
- Marco Masseroli
- BioMedical Informatics Laboratory, Dipartimento di Bioingegneria, Politecnico di Milano, 1-20133 Milan, Italy.
| |
Collapse
|
26
|
Wishart DS. Discovering drug targets through the web. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2006; 2:9-17. [PMID: 20483274 DOI: 10.1016/j.cbd.2006.01.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2005] [Revised: 01/28/2006] [Accepted: 01/30/2006] [Indexed: 11/25/2022]
Abstract
Traditionally, drug-target discovery is a "wet-bench" experimental process, depending on carefully designed genetic screens, biochemical tests and cellular assays to identify proteins and genes that are associated with a particular disease or condition. However, recent advances in DNA sequencing, transcript profiling, protein identification and protein quantification are leading to a flood of genomic and proteomic data that is, or potentially could be, linked to disease data. The quantity of data generated by these high throughput methods is forcing scientists to re-think the way they do traditional drug-target discovery. In particular it is leading them more and more towards identifying potential drug targets using computers. In fact, drug-target identification is now being done as much on the desk-top as on the bench-top. This review focuses on describing how drug-target discovery can be done in silico (i.e. via computer) using a variety of bioinformatic resources that are freely available on the web. Specifically, it highlights a number of web-accessible sequence databases, automated genome annotation tools, text mining tools; and integrated drug/sequence databases that can be used to identify drug targets for both endogenous (genetic and epigenetic) diseases as well as exogenous (infectious) diseases.
Collapse
Affiliation(s)
- David S Wishart
- Departments of Computing Science and Biological Sciences, University of Alberta, Edmonton, AB, Canada T6G 2E8
| |
Collapse
|
27
|
Masseroli M, Galati O, Manzotti M, Gibert K, Pinciroli F. Inherited disorder phenotypes: controlled annotation and statistical analysis for knowledge mining from gene lists. BMC Bioinformatics 2005; 6 Suppl 4:S18. [PMID: 16351744 PMCID: PMC1866390 DOI: 10.1186/1471-2105-6-s4-s18] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Analysis of inherited diseases and their associated phenotypes is of great importance to gain knowledge of underlying genetic interactions and could ultimately give clinically useful insights into disease processes, including complex diseases influenced by multiple genetic loci. Nevertheless, to date few computational contributions have been proposed for this purpose, mainly due to lack of controlled clinical information easily accessible and structured for computational genome-wise analyses. To allow performing phenotype analyses of inherited disorder related genes we implemented new original modules within GFINDer http://www.bioinformatics.polimi.it/GFINDer/, a Web system we previously developed that dynamically aggregates functional annotations of user uploaded gene lists and allows performing their statistical analysis and mining. RESULTS New GFINDer modules allow annotating large numbers of user classified biomolecular sequence identifiers with morbidity and clinical information, classifying them according to genetic disease phenotypes and their locations of occurrence, and statistically analyzing the obtained classifications. To achieve this we exploited, normalized and structured the information present in textual form in the Clinical Synopsis sections of the Online Mendelian Inheritance in Man (OMIM) databank. Such valuable information delineates numerous signs and symptoms accompanying many genetic diseases and it is divided into phenotype location categories, either by organ system or type of finding. CONCLUSION Supporting phenotype analyses of inherited diseases and biomolecular functional evaluations, GFINDer facilitates a genomic approach to the understanding of fundamental biological processes and complex cellular mechanisms underlying patho-physiological phenotypes.
Collapse
Affiliation(s)
- Marco Masseroli
- BioMedical Informatics Laboratory, Bioengineering Department, Politecnico di Milano, piazza Leonardo da Vinci 32, 20133 Milano, Italy
| | - Osvaldo Galati
- BioMedical Informatics Laboratory, Bioengineering Department, Politecnico di Milano, piazza Leonardo da Vinci 32, 20133 Milano, Italy
| | - Mauro Manzotti
- BioMedical Informatics Laboratory, Bioengineering Department, Politecnico di Milano, piazza Leonardo da Vinci 32, 20133 Milano, Italy
| | - Karina Gibert
- Departament d'Estadística i Investigació Operativa, Universitat Politècnica de Catalunya, C. Pau Gargallo 5, 08028 Barcelona, Spain
| | - Francesco Pinciroli
- BioMedical Informatics Laboratory, Bioengineering Department, Politecnico di Milano, piazza Leonardo da Vinci 32, 20133 Milano, Italy
| |
Collapse
|
28
|
Goebell PJ, Groshen S, Schmitz-Dräger BJ, Sylvester R, Kogevinas M, Malats N, Sauter G, Barton Grossman H, Waldman F, Cote RJ. The International Bladder Cancer Bank: proposal for a new study concept. Urol Oncol 2004; 22:277-84. [PMID: 15283883 DOI: 10.1016/s1078-1439(03)00175-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2003] [Revised: 09/18/2003] [Accepted: 10/08/2003] [Indexed: 10/26/2022]
Abstract
At present, results of marker studies are often inconsistent and sometimes contradictory. Recognized problems include multiple different methods of performing the assays, different subsets of patients and different endpoints, leading to incompatible datasets. Although there has been discussion of establishing general methodological principles and guidelines (analogous to those for clinical trials) for design, conduct, analysis, and reporting of marker studies, these have not been widely implemented. There are no well-recognized prototypes or examples that the urologic researcher can use to model future marker studies. We will discuss our plans to establish a multi-institutional bladder cancer data base and virtual tumor bank as a resource for participating institutions to evaluate the biological and prognostic significance of potential markers for bladder cancer. Samples will be identified and stored at each participating institution and will be available for analysis. A standard, minimal set of patient and pathologic information will be collected. The use of common software, as part of this proposal will facilitate the data transfer of updated patient information to a central database. All contributing centers will have access to summarized information, also to simplify the process of finding collaborating partners. Prospectively collected, consistent datasets with available long-term follow-up, should provide information sooner than with a conventional prospective study. Furthermore, the quality of these data and samples may be superior to that of retrospectively collected data and samples. The proposed International Bladder Cancer Bank of specimens and data will be an effective tool during all phases of marker development.
Collapse
Affiliation(s)
- Peter J Goebell
- Department of Preventive Medicine, USC/Norris Comprehensive Cancer Center, University of Southern California, Keck School of Medicine, Los Angeles, CA, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Narayanasamy V, Mukhopadhyay S, Palakal M, Potter DA. TransMiner: Mining transitive associations among biological objects from text. J Biomed Sci 2004; 11:864-73. [PMID: 15591784 DOI: 10.1007/bf02254372] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2004] [Accepted: 06/16/2004] [Indexed: 11/29/2022] Open
Abstract
Associations among biological objects such as genes, proteins, and drugs can be discovered automatically from the scientific literature. TransMiner is a system for finding associations among objects by mining the Medline database of the scientific literature. The direct associations among the objects are discovered based on the principle of co-occurrence in the form of an association graph. The principle of transitive closure is applied to the association graph to find potential transitive associations. The potential transitive associations that are indeed direct are discovered by iterative retrieval and mining of the Medline documents. Those associations that are not found explicitly in the entire Medline database are transitive associations and are the candidates for hypothesis generation. The transitive associations were ranked based on the sum of weight of terms that co-occur with both the objects. The direct and transitive associations are visualized using a graph visualization applet. TransMiner was tested by finding associations among 56 breast cancer genes and among 24 objects in the calpain signal transduction pathway. TransMiner was also used to rediscover associations between magnesium and migraine.
Collapse
Affiliation(s)
- Vijay Narayanasamy
- School of Informatics, Indiana University School of Medicine, Indiana University Purdue University Indianapolis, Ind., USA
| | | | | | | |
Collapse
|
30
|
|
31
|
Hu Y, Hines LM, Weng H, Zuo D, Rivera M, Richardson A, LaBaer J. Analysis of genomic and proteomic data using advanced literature mining. J Proteome Res 2003; 2:405-12. [PMID: 12938930 DOI: 10.1021/pr0340227] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
High-throughput technologies, such as proteomic screening and DNA micro-arrays, produce vast amounts of data requiring comprehensive analytical methods to decipher the biologically relevant results. One approach would be to manually search the biomedical literature; however, this would be an arduous task. We developed an automated literature-mining tool, termed MedGene, which comprehensively summarizes and estimates the relative strengths of all human gene-disease relationships in Medline. Using MedGene, we analyzed a novel micro-array expression dataset comparing breast cancer and normal breast tissue in the context of existing knowledge. We found no correlation between the strength of the literature association and the magnitude of the difference in expression level when considering changes as high as 5-fold; however, a significant correlation was observed (r = 0.41; p = 0.05) among genes showing an expression difference of 10-fold or more. Interestingly, this only held true for estrogen receptor (ER) positive tumors, not ER negative. MedGene identified a set of relatively understudied, yet highly expressed genes in ER negative tumors worthy of further examination.
Collapse
Affiliation(s)
- Yanhui Hu
- Institute of Proteomics, Harvard Medical School-BCMP, 240 Longwood Avenue, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | |
Collapse
|
32
|
Abstract
The Oral Cancer Gene Database (OrCGDB; http://www.tumor-gene. org/Oral/oral.html) was developed to provide the biomedical community with easy access to the latest information on the genes involved in oral cancer. The information is stored in a relational database and accessed through a WWW interface. The OrCGDB is organized by gene name, which is linked to information describing properties of the gene. This information is stored as a collection of findings ('facts') that are entered by the database curator in a semi-structured format from information in primary publications using a WWW interface. These facts include causes of oncogenic activation, chromosomal localization of the gene, mutations associated with the gene, the biochemical identity and activity of the gene product, synonyms for the gene name and a variety of clinical information. Each fact is associated with a MEDLINE citation. The user can search the OrCGDB by gene name or by entering a textword. The OrCGDB is part of a larger WWW-based tumor gene database and represents a new approach to catalog and display the research literature.
Collapse
Affiliation(s)
- A E Levine
- Department of Basic Sciences, The University of Texas Health Science Center Houston, Dental Branch, Houston, TX 77030, USA.
| | | |
Collapse
|
33
|
Drysdale R, Bayraktaroglu L. Current awareness. Yeast 2000; 17:159-66. [PMID: 10900461 PMCID: PMC2448328 DOI: 10.1155/2000/907141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
In order to keep subscribers up-to-date with the latest developments in their field, this current awareness service is provided by John Wiley & Sons and contains newly-published material on comparative and functional genomics. Each bibliography is divided into 16 sections. 1 Reviews & symposia; 2 General; 3 Large-scale sequencing and mapping; 4 Genome evolution; 5 Comparative genomics; 6 Gene families and regulons; 7 Pharmacogenomics; 8 Large-scale mutagenesis programmes; 9 Functional complementation; 10 Transcriptomics; 11 Proteomics; 12 Protein structural genomics; 13 Metabolomics; 14 Genomic approaches to development; 15 Technological advances; 16 Bioinformatics. Within each section, articles are listed in alphabetical order with respect to author. If, in the preceding period, no publications are located relevant to any one of these headings, that section will be omitted
Collapse
Affiliation(s)
- R Drysdale
- FlyBase-Cambridge, Department of Genetics, University of Cambridge, UK
| | | |
Collapse
|
34
|
|