1
|
Yu C, Qi X, Yan W, Wu W, Shen B. Next-Generation Sequencing Markup Language (NGSML): A Medium for the Representation and Exchange of NGS Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:576-585. [PMID: 35085089 DOI: 10.1109/tcbb.2022.3144170] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
With the increasing demand for low-cost high-throughput sequencing of large genomes, next-generation sequencing (NGS) technology has developed rapidly. NGS can not only be used in basic scientific research but also in clinical diagnostics and healthcare. Numerous software systems and tools have been developed to analyze NGS data, and various data formats have been produced to accommodate different sequencing equipment providers or analytical software. However, the data interoperability between these tools brings great challenges to researchers. A generic format that could be shared by most of the software and tools in the NGS field would make data interoperability and sharing easier. In this paper, we defined a general XML-based NGS markup language (NGSML) format for the representation and exchange of NGS data. We also developed a user-friendly GUI tool, NGSMLEditor, for presenting, creating, editing, and converting NGSML files. By using NGSML, various types of NGS data can be saved in one unified format. Compared with the unstructured plain text file, a structured data format based on XML technology solves the incompatibility of various NGS data formats. The NGSML specifications are freely available from http://www.sysbio.org.cn/NGSML. NGSMLEditor is open source under GNU GPL and can be downloaded from the website.
Collapse
|
2
|
Zhou X, Yao L, Zhou X, Cong R, Luan J, Wei X, Zhang X, Song N. Pyroptosis-Related lncRNA Prognostic Model for Renal Cancer Contributes to Immunodiagnosis and Immunotherapy. Front Oncol 2022; 12:837155. [PMID: 35860590 PMCID: PMC9291251 DOI: 10.3389/fonc.2022.837155] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 06/06/2022] [Indexed: 12/25/2022] Open
Abstract
BackgroundRenal clear cell cancer (ccRCC) is one of the most common cancers in humans. Thus, we aimed to construct a risk model to predict the prognosis of ccRCC effectively.MethodsWe downloaded RNA sequencing (RNA-seq) data and clinical information of 539 kidney renal clear cell carcinoma (KIRC) patients and 72 normal humans from The Cancer Genome Atlas (TCGA) database and divided the data into training and testing groups randomly. Pyroptosis-related lncRNAs (PRLs) were obtained through Pearson correlation between pyroptosis genes and all lncRNAs (p < 0.05, coeff > 0.3). Univariate and multivariate Cox regression analyses were then performed to select suitable lncRNAs. Next, a novel signature was constructed and evaluated by survival analysis and ROC analysis. The same observation applies to the testing group to validate the value of the signature. By gene set enrichment analysis (GSEA), we predicted the underlying signaling pathway. Furthermore, we calculated immune cell infiltration, immune checkpoint, the T-cell receptor/B-cell receptor (TCR/BCR), SNV, and Tumor Immune Dysfunction and Exclusion (TIDE) scores in TCGA database. We also validated our model with an immunotherapy cohort. Finally, the expression of PRLs was validated by quantitative PCR (qPCR).ResultsWe constructed a prognostic signature composed of six key lncRNAs (U62317.1, MIR193BHG, LINC02027, AC121338.2, AC005785.1, AC156455.1), which significantly predict different overall survival (OS) rates. The efficiency was demonstrated using the receiver operating characteristic (ROC) curve. The signature was observed to be an independent prognostic factor in cohorts. In addition, we found the PRLs promote the tumor progression via immune-related pathways revealed in GSEA. Furthermore, the TCR, BCR, and SNV data were retrieved to screen immune features, and immune cell scores were calculated to measure the effect of the immune microenvironment on the risk model, indicating that high- and low-risk scores have different immune statuses. The TIDE algorithm was then used to predict the immune checkpoint blockade (ICB) response of our model, and subclass mapping was used to verify our model in another immunotherapy cohort data. Finally, qPCR validates the PRLs in cell lines.ConclusionThis study provided a new risk model to evaluate ccRCC and may be pyroptosis-related therapeutic targets in the clinic.
Collapse
Affiliation(s)
- Xuan Zhou
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Liangyu Yao
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Xiang Zhou
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Rong Cong
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Jiaochen Luan
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Xiyi Wei
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Xu Zhang
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Ninghong Song
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
- *Correspondence: Ninghong Song,
| |
Collapse
|
3
|
Belcher Dufrisne M, Swope N, Kieber M, Yang JY, Han J, Li J, Moremen KW, Prestegard JH, Columbus L. Human CEACAM1 N-domain dimerization is independent from glycan modifications. Structure 2022; 30:658-670.e5. [PMID: 35219398 PMCID: PMC9081242 DOI: 10.1016/j.str.2022.02.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 11/15/2021] [Accepted: 02/01/2022] [Indexed: 12/31/2022]
Abstract
Carcinoembryonic cellular adhesion molecules (CEACAMs) serve diverse roles in cell signaling, proliferation, and survival and are made up of one or several immunoglobulin (Ig)-like ectodomains glycosylated in vivo. The physiological oligomeric state and how it contributes to protein function are central to understanding CEACAMs. Two putative dimer conformations involving different CEACAM1 N-terminal Ig-like domain (CCM1) protein faces (ABED and GFCC'C″) were identified from crystal structures. GFCC'C″ was identified as the dominant CCM1 solution dimer, but ambiguity regarding the effect of glycosylation on dimer formation calls its physiological relevance into question. We present the first crystal structure of minimally glycosylated CCM1 in the GFCC'C″ dimer conformation and characterization in solution by continuous-wave and double electron-electron resonance electron paramagnetic resonance spectroscopy. Our results suggest the GFCC'C″ dimer is dominant in solution with different levels of glycosylation, and structural conservation and co-evolved residues support that the GFCC'C″ dimer is conserved across CEACAMs.
Collapse
Affiliation(s)
| | - Nicole Swope
- Department of Chemistry, University of Virginia, Charlottesville, VA 22904, USA
| | - Marissa Kieber
- Department of Chemistry, University of Virginia, Charlottesville, VA 22904, USA
| | - Jeong-Yeh Yang
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602, USA
| | - Ji Han
- Department of Chemistry, University of Virginia, Charlottesville, VA 22904, USA
| | - Jason Li
- Department of Chemistry, University of Virginia, Charlottesville, VA 22904, USA
| | - Kelley W Moremen
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602, USA
| | - James H Prestegard
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602, USA
| | - Linda Columbus
- Department of Chemistry, University of Virginia, Charlottesville, VA 22904, USA.
| |
Collapse
|
4
|
Perez I, Berndt S, Agarwal R, Castro MA, Vishnivetskiy SA, Smith JC, Sanders CR, Gurevich VV, Iverson TM. A Model for the Signal Initiation Complex Between Arrestin-3 and the Src Family Kinase Fgr. J Mol Biol 2022; 434:167400. [PMID: 34902430 PMCID: PMC8752512 DOI: 10.1016/j.jmb.2021.167400] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 11/24/2021] [Accepted: 12/04/2021] [Indexed: 02/01/2023]
Abstract
Arrestins regulate a wide range of signaling events, most notably when bound to active G protein-coupled receptors (GPCRs). Among the known effectors recruited by GPCR-bound arrestins are Src family kinases, which regulate cellular growth and proliferation. Here, we focus on arrestin-3 interactions with Fgr kinase, a member of the Src family. Previous reports demonstrated that Fgr exhibits high constitutive activity, but can be further activated by both arrestin-dependent and arrestin-independent pathways. We report that arrestin-3 modulates Fgr activity with a hallmark bell-shaped concentration-dependence, consistent with a role as a signaling scaffold. We further demonstrate using NMR spectroscopy that a polyproline motif within arrestin-3 interacts directly with the SH3 domain of Fgr. To provide a framework for this interaction, we determined the crystal structure of the Fgr SH3 domain at 1.9 Å resolution and developed a model for the GPCR-arrestin-3-Fgr complex that is supported by mutagenesis. This model suggests that Fgr interacts with arrestin-3 at multiple sites and is consistent with the locations of disease-associated Fgr mutations. Collectively, these studies provide a structural framework for arrestin-dependent activation of Fgr.
Collapse
Affiliation(s)
- Ivette Perez
- Department of Biochemistry, Vanderbilt University, Nashville, TN 37232-0146, USA; Center for Structural Biology, Nashville, TN 37232-0146, USA
| | - Sandra Berndt
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232-0146, USA; Center for Structural Biology, Nashville, TN 37232-0146, USA
| | - Rupesh Agarwal
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA; UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, USA
| | - Manuel A Castro
- Department of Biochemistry, Vanderbilt University, Nashville, TN 37232-0146, USA; Center for Structural Biology, Nashville, TN 37232-0146, USA
| | | | - Jeremy C Smith
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA; UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, TN, USA
| | - Charles R Sanders
- Department of Biochemistry, Vanderbilt University, Nashville, TN 37232-0146, USA; Center for Structural Biology, Nashville, TN 37232-0146, USA
| | - Vsevolod V Gurevich
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232-0146, USA.
| | - T M Iverson
- Department of Biochemistry, Vanderbilt University, Nashville, TN 37232-0146, USA; Department of Pharmacology, Vanderbilt University, Nashville, TN 37232-0146, USA; Center for Structural Biology, Nashville, TN 37232-0146, USA; Vanderbilt Institute of Chemical Biology, Nashville, TN 37232-0146, USA.
| |
Collapse
|
5
|
Altered Protein Abundance and Localization Inferred from Sites of Alternative Modification by Ubiquitin and SUMO. J Mol Biol 2021; 433:167219. [PMID: 34464654 DOI: 10.1016/j.jmb.2021.167219] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 08/11/2021] [Accepted: 08/23/2021] [Indexed: 12/19/2022]
Abstract
Protein modification by ubiquitin or SUMO can alter the function, stability or activity of target proteins. Previous studies have identified thousands of substrates that were modified by ubiquitin or SUMO on the same lysine residue. However, it remains unclear whether such overlap could result from a mere higher solvent accessibility, whether proteins containing those sites are associated with specific functional traits, and whether selectively perturbing their modification by ubiquitin or SUMO could result in different phenotypic outcomes. Here, we mapped reported lysine modification sites across the human proteome and found an enrichment of sites reported to be modified by both ubiquitin and SUMO. Our analysis uncovered thousands of proteins containing such sites, which we term Sites of Alternative Modification (SAMs). Among more than 36,000 sites reported to be modified by SUMO, 51.8% have also been reported to be modified by ubiquitin. SAM-containing proteins are associated with diverse biological functions including cell cycle, DNA damage, and transcriptional regulation. As such, our analysis highlights numerous proteins and pathways as putative targets for further elucidating the crosstalk between ubiquitin and SUMO. Comparing the biological and biochemical properties of SAMs versus other non-overlapping modification sites revealed that these sites were associated with altered cellular localization or abundance of their host proteins. Lastly, using S. cerevisiae as model, we show that mutating the SAM motif in a protein can influence its ubiquitination as well as its localization and abundance.
Collapse
|
6
|
Torres-Arancivia CM, Chang D, Hackett WE, Zaia J, Connors LH. Glycosylation of Serum Clusterin in Wild-Type Transthyretin-Associated (ATTRwt) Amyloidosis: A Study of Disease-Associated Compositional Features Using Mass Spectrometry Analyses. Biochemistry 2020; 59:4367-4378. [PMID: 33141553 PMCID: PMC8082438 DOI: 10.1021/acs.biochem.0c00590] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Wild-type transthyretin-associated (ATTRwt) amyloidosis is an age-related disease that causes heart failure in older adults. This disease frequently features cardiac amyloid fibril deposits that originate from dissociation of the tetrameric protein, transthyretin (TTR). Unlike hereditary TTR (ATTRm) amyloidosis, where amino acid replacements destabilize the native protein, in ATTRwt amyloidosis, amyloid-forming TTR lacks protein sequence alterations. The initiating cause of fibril formation in ATTRwt amyloidosis is unclear, and thus, it seems plausible that other factors are involved in TTR misfolding and unregulated accumulation of wild-type TTR fibrils. We believe that clusterin (CLU, UniProtKB P10909), a plasma circulating glycoprotein, plays a role in the pathobiology of ATTRwt amyloidosis. Previously, we have suggested a role for CLU in ATTRwt amyloidosis based on our studies showing that (1) CLU codeposits with non-native TTR in amyloid fibrils from ATTRwt cardiac tissue, (2) CLU interacts only with non-native (monomeric and aggregated) forms of TTR, and (3) CLU serum levels in patients with ATTRwt are significantly lower compared to healthy controls. In the present study, we provide comprehensive detail of compositional findings from mass spectrometry analyses of amino acid and glycan content of CLU purified from ATTRwt and control sera. The characterization of oligosaccharide content in serum CLU derived from patients with ATTRwt amyloidosis is novel data. Moreover, results comparing CLU oligosaccharide variations between patient and healthy controls are original and provide further evidence for the role of CLU in ATTRwt pathobiology, possibly linked to disease-specific structural features that limit the chaperoning capacity of CLU.
Collapse
|
7
|
Dingerdissen HM, Bastian F, Vijay-Shanker K, Robinson-Rechavi M, Bell A, Gogate N, Gupta S, Holmes E, Kahsay R, Keeney J, Kincaid H, King CH, Liu D, Crichton DJ, Mazumder R. OncoMX: A Knowledgebase for Exploring Cancer Biomarkers in the Context of Related Cancer and Healthy Data. JCO Clin Cancer Inform 2020; 4:210-220. [PMID: 32142370 PMCID: PMC7101249 DOI: 10.1200/cci.19.00117] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
PURPOSE The purpose of OncoMX1 knowledgebase development was to integrate cancer biomarker and relevant data types into a meta-portal, enabling the research of cancer biomarkers side by side with other pertinent multidimensional data types. METHODS Cancer mutation, cancer differential expression, cancer expression specificity, healthy gene expression from human and mouse, literature mining for cancer mutation and cancer expression, and biomarker data were integrated, unified by relevant biomedical ontologies, and subjected to rule-based automated quality control before ingestion into the database. RESULTS OncoMX provides integrated data encompassing more than 1,000 unique biomarker entries (939 from the Early Detection Research Network [EDRN] and 96 from the US Food and Drug Administration) mapped to 20,576 genes that have either mutation or differential expression in cancer. Sentences reporting mutation or differential expression in cancer were extracted from more than 40,000 publications, and healthy gene expression data with samples mapped to organs are available for both human genes and their mouse orthologs. CONCLUSION OncoMX has prioritized user feedback as a means of guiding development priorities. By mapping to and integrating data from several cancer genomics resources, it is hoped that OncoMX will foster a dynamic engagement between bioinformaticians and cancer biomarker researchers. This engagement should culminate in a community resource that substantially improves the ability and efficiency of exploring cancer biomarker data and related multidimensional data.
Collapse
Affiliation(s)
| | - Frederic Bastian
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | | | - Marc Robinson-Rechavi
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Amanda Bell
- The George Washington University, Washington DC
| | | | | | - Evan Holmes
- The George Washington University, Washington DC
| | | | | | | | | | - David Liu
- NASA Jet Propulsion Laboratory, Pasadena, CA
| | | | | |
Collapse
|
8
|
Rujas E, Cui H, Sicard T, Semesi A, Julien JP. Structural characterization of the ICOS/ICOS-L immune complex reveals high molecular mimicry by therapeutic antibodies. Nat Commun 2020; 11:5066. [PMID: 33033255 PMCID: PMC7545189 DOI: 10.1038/s41467-020-18828-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 09/15/2020] [Indexed: 12/20/2022] Open
Abstract
The inducible co-stimulator (ICOS) is a member of the CD28/B7 superfamily, and delivers a positive co-stimulatory signal to activated T cells upon binding to its ligand (ICOS-L). Dysregulation of this pathway has been implicated in autoimmune diseases and cancer, and is currently under clinical investigation as an immune checkpoint blockade. Here, we describe the molecular interactions of the ICOS/ICOS-L immune complex at 3.3 Å resolution. A central FDPPPF motif and residues within the CC' loop of ICOS are responsible for the specificity of the interaction with ICOS-L, with a distinct receptor binding orientation in comparison to other family members. Furthermore, our structure and binding data reveal that the ICOS N110 N-linked glycan participates in ICOS-L binding. In addition, we report crystal structures of ICOS and ICOS-L in complex with monoclonal antibodies under clinical evaluation in immunotherapy. Strikingly, antibody paratopes closely mimic receptor-ligand binding core interactions, in addition to contacting peripheral residues to confer high binding affinities. Our results uncover key molecular interactions of an immune complex central to human adaptive immunity and have direct implications for the ongoing development of therapeutic interventions targeting immune checkpoint receptors.
Collapse
Affiliation(s)
- Edurne Rujas
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, M5G 0A4, Canada.,Biofisika Institute (CSIC, UPV/EHU) and Department of Biochemistry and Molecular Biology, University of the Basque Country (UPV/EHU), P.O. Box 644, 48080, Bilbao, Spain
| | - Hong Cui
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, M5G 0A4, Canada
| | - Taylor Sicard
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, M5G 0A4, Canada.,Department of Biochemistry, University of Toronto, Toronto, ON, M5S 1A8, Canada
| | - Anthony Semesi
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, M5G 0A4, Canada
| | - Jean-Philippe Julien
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ON, M5G 0A4, Canada. .,Department of Biochemistry, University of Toronto, Toronto, ON, M5S 1A8, Canada. .,Department of Immunology, University of Toronto, Toronto, ON, M5S 1A8, Canada.
| |
Collapse
|
9
|
Bhattacharya S, Sah PP, Banerjee A, Ray S. Exploring Single Nucleotide Polymorphisms in ITGAV for Gastric, Pancreatic and Liver Malignancies: An Approach Towards the Discovery of Biomarker. Comb Chem High Throughput Screen 2020; 24:860-873. [PMID: 32819225 DOI: 10.2174/1386207323999200818164104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 06/17/2020] [Accepted: 07/22/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Integrin αV, encoded by ITGAV gene, is one of the most studied protein subunits, closely associated with liver, pancreatic and stomach cancer progression and metastasis via regulation of angiogenesis. The occurrence of Single Nucleotide Polymorphisms (SNPs) in cancer- associated proteins is a key determinant for varied susceptibility of an individual towards cancer. METHODOLOGY The study investigated the deleterious effects of these cancer-associated SNPs on the protein's structure, stability and cancer causing potential using an in silico approach. Numerous computational tools were employed that identified the most deleterious cancer-associated SNPs and those to get actively involved in post-translational modifications. The impact of these SNPs on the protein structure, function and stability was also examined. Conclusion and Future Scope: A total 63 non-synonymous SNPs in ITGAV gene were observed to be associated in these three gastrointestinal cancers and among this, 63, 19 were the most deleterious ones. The structural and functional importance of residues altered by most damaging SNPs was analyzed through evolutionary conservation and solvent accessibility. The study also elucidated three-dimensional structures of the 19 most damaging mutants. The analysis of conformational variation identified 5 SNPs (D379Y, G188E, G513V, L950P, and R540L) in integrin αV, which influence the protein's structure. Three calcium binding sites were predicted at residues: D379, G384 and G408 and a peptide binding site at residue: R369 in integrin αV. Therefore, SNPs D379Y, G384C, G408R and R369W have the potential to alter the binding properties of the protein. Screening and characterization of deleterious SNPs could advance novel biomarker discovery and therapeutic development in the future.
Collapse
Affiliation(s)
| | | | - Arundhati Banerjee
- Department of Biochemistry and Biophysics, University of Kalyani, Kalyani, Nadia, India
| | - Sujay Ray
- Amity Institute of Biotechnology, Amity University, Kolkata, India
| |
Collapse
|
10
|
Saberian N, Shafi A, Peyvandipour A, Draghici S. MAGPEL: an autoMated pipeline for inferring vAriant-driven Gene PanEls from the full-length biomedical literature. Sci Rep 2020; 10:12365. [PMID: 32703994 PMCID: PMC7378213 DOI: 10.1038/s41598-020-68649-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 06/17/2020] [Indexed: 11/09/2022] Open
Abstract
In spite of the efforts in developing and maintaining accurate variant databases, a large number of disease-associated variants are still hidden in the biomedical literature. Curation of the biomedical literature in an effort to extract this information is a challenging task due to: (i) the complexity of natural language processing, (ii) inconsistent use of standard recommendations for variant description, and (iii) the lack of clarity and consistency in describing the variant-genotype-phenotype associations in the biomedical literature. In this article, we employ text mining and word cloud analysis techniques to address these challenges. The proposed framework extracts the variant-gene-disease associations from the full-length biomedical literature and designs an evidence-based variant-driven gene panel for a given condition. We validate the identified genes by showing their diagnostic abilities to predict the patients' clinical outcome on several independent validation cohorts. As representative examples, we present our results for acute myeloid leukemia (AML), breast cancer and prostate cancer. We compare these panels with other variant-driven gene panels obtained from Clinvar, Mastermind and others from literature, as well as with a panel identified with a classical differentially expressed genes (DEGs) approach. The results show that the panels obtained by the proposed framework yield better results than the other gene panels currently available in the literature.
Collapse
Affiliation(s)
- Nafiseh Saberian
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Adib Shafi
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Azam Peyvandipour
- Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, MI, USA.
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, USA.
| |
Collapse
|
11
|
Deschuyter M, Pennarubia F, Pinault E, Legardinier S, Maftah A. Functional Characterization of POFUT1 Variants Associated with Colorectal Cancer. Cancers (Basel) 2020; 12:cancers12061430. [PMID: 32486426 PMCID: PMC7352195 DOI: 10.3390/cancers12061430] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 05/20/2020] [Accepted: 05/28/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Protein O-fucosyltransferase 1 (POFUT1) overexpression, which is observed in many cancers such as colorectal cancer (CRC), leads to a NOTCH signaling dysregulation associated with the tumoral process. In rare CRC cases, with no POFUT1 overexpression, seven missense mutations were found in human POFUT1. METHODS Recombinant secreted forms of human WT POFUT1 and its seven mutated counterparts were produced and purified. Their O-fucosyltransferase activities were assayed in vitro using a chemo-enzymatic approach with azido-labeled GDP-fucose as a donor substrate and NOTCH1 EGF-LD26, produced in E. coli periplasm, as a relevant acceptor substrate. Targeted mass spectrometry (MS) was carried out to quantify the O-fucosyltransferase ability of all POFUT1 proteins. FINDINGS MS analyses showed a significantly higher O-fucosyltransferase activity of six POFUT1 variants (R43H, Y73C, T115A, I343V, D348N, and R364W) compared to WT POFUT1. INTERPRETATION This study provides insights on the possible involvement of these seven missense mutations in colorectal tumors. The hyperactive forms could lead to an increased O-fucosylation of POFUT1 protein targets such as NOTCH receptors in CRC patients, thereby leading to a NOTCH signaling dysregulation. It is the first demonstration of gain-of-function mutations for this crucial glycosyltransferase, modulating NOTCH activity, as well as that of other potential glycoproteins.
Collapse
Affiliation(s)
- Marlène Deschuyter
- PEIRENE, EA 7500, Glycosylation and Cell Differentiation, Faculty of Sciences and Technology, University of Limoges, F-87060 Limoges, France; (M.D.); (F.P.); (E.P.); (S.L.)
| | - Florian Pennarubia
- PEIRENE, EA 7500, Glycosylation and Cell Differentiation, Faculty of Sciences and Technology, University of Limoges, F-87060 Limoges, France; (M.D.); (F.P.); (E.P.); (S.L.)
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602, USA
| | - Emilie Pinault
- PEIRENE, EA 7500, Glycosylation and Cell Differentiation, Faculty of Sciences and Technology, University of Limoges, F-87060 Limoges, France; (M.D.); (F.P.); (E.P.); (S.L.)
- BISCEm US042 INSERM—UMS 2015 CNRS, Mass Spectrometry Platform, Faculty of Medicine and Pharmacy, University of Limoges, F-87025 Limoges, France
| | - Sébastien Legardinier
- PEIRENE, EA 7500, Glycosylation and Cell Differentiation, Faculty of Sciences and Technology, University of Limoges, F-87060 Limoges, France; (M.D.); (F.P.); (E.P.); (S.L.)
| | - Abderrahman Maftah
- PEIRENE, EA 7500, Glycosylation and Cell Differentiation, Faculty of Sciences and Technology, University of Limoges, F-87060 Limoges, France; (M.D.); (F.P.); (E.P.); (S.L.)
- Correspondence: ; Tel.: +33-5554-57684; Fax: +33-5554-57653
| |
Collapse
|
12
|
An Y, Zhang L, Liu W, Jiang Y, Chen X, Lan X, Li G, Hang Q, Wang J, Gusella JF, Du Y, Shen Y. De novo variants in the Helicase-C domain of CHD8 are associated with severe phenotypes including autism, language disability and overgrowth. Hum Genet 2020; 139:499-512. [PMID: 31980904 DOI: 10.1007/s00439-020-02115-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2019] [Accepted: 01/06/2020] [Indexed: 01/17/2023]
Abstract
CHD8, which encodes Chromodomain helicase DNA-binding protein 8, is one of a few well-established Autism Spectrum Disorder (ASD) genes. Over 60 mutations have been reported in subjects with variable phenotypes, but little is known concerning genotype-phenotype correlations. We have identified four novel de novo mutations in Chinese subjects: two nonsense variants (c.3562C>T/p.Arg1188X, c.2065C>A/p.Glu689X), a splice site variant (c.4818-1G>A) and a missense variant (c.3502T>A/p.Tyr1168Asn). Three of these were identified from a 445-member ASD cohort by ASD gene panel sequencing of the 96 subjects who remained negative after molecular testing for copy number variation, Rett syndrome, FragileX and tuberous sclerosis complex (TSC). The fourth (p.Glu689X) was detected separately by diagnostic trio exome sequencing. We used diagnostic instruments and a comprehensive review of phenotypes, including prenatal and postnatal growth parameters, developmental milestones, and dysmorphic features to compare these four subjects. In addition to autism, they also presented with prenatal onset macrocephaly, intellectual disability, overgrowth during puberty, sleep disorder, and dysmorphic features, including broad forehead with prominent supraorbital ridges, flat nasal bridge, telecanthus and large ears. For further comparison, we compiled a comprehensive list of CHD8 variants from the literature and databases, which revealed constitutive and somatic truncating variants in the HELIC (Helicase-C) domain in ASD and in cancer patients, respectively, but not in the general population. Furthermore, HELIC domain mutations were associated with a severe phenotype defined by a greater number of clinical features, lower verbal IQ, and a prominent, consistent pattern of overgrowth as measured by weight, height and head circumference. Overall, this study adds to the ASD-associated loss-of-function mutations in CHD8 and highlights the clinical importance of the HELIC domain of CHD8.
Collapse
Affiliation(s)
- Yu An
- Human Phenome Institute, Fudan University, 825 Zhangheng Road, Shanghai, 201203, China.
| | - Linna Zhang
- Huangpu District Mental Health Center, 1162 Qu Xi Road, Shanghai, 200023, China
| | - Wenwen Liu
- Shanghai Mental Health Center, Shanghai Jiaotong University School of Medicine, 600 Wan ping Nan Road, Shanghai, 200013, China
| | - Yunyun Jiang
- Maternal and Child Health Hospital, Children's Hospital and Birth Defect Prevention Research Institute of Guangxi Zhuang Autonomous Region, 59 Xiangzhu Avenue, Nanning, 530002, Guangxi, China
| | - Xue Chen
- Maternal and Child Health Hospital, Children's Hospital and Birth Defect Prevention Research Institute of Guangxi Zhuang Autonomous Region, 59 Xiangzhu Avenue, Nanning, 530002, Guangxi, China
| | - Xiaoping Lan
- Children's Hospital of Shanghai, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Gan Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Qiang Hang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, 200438, China
| | - Jian Wang
- Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - James F Gusella
- Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Yasong Du
- Shanghai Mental Health Center, Shanghai Jiaotong University School of Medicine, 600 Wan ping Nan Road, Shanghai, 200013, China
| | - Yiping Shen
- Maternal and Child Health Hospital, Children's Hospital and Birth Defect Prevention Research Institute of Guangxi Zhuang Autonomous Region, 59 Xiangzhu Avenue, Nanning, 530002, Guangxi, China.,Shanghai Children's Medical Center, Shanghai Jiaotong University School of Medicine, Shanghai, China.,Molecular Neurogenetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA.,Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| |
Collapse
|
13
|
Berndt S, Gurevich VV, Iverson TM. Crystal structure of the SH3 domain of human Lyn non-receptor tyrosine kinase. PLoS One 2019; 14:e0215140. [PMID: 30969999 PMCID: PMC6457566 DOI: 10.1371/journal.pone.0215140] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 03/27/2019] [Indexed: 01/07/2023] Open
Abstract
Lyn kinase (Lck/Yes related novel protein tyrosine kinase) belongs to the family of Src-related non-receptor tyrosine kinases. Consistent with physiological roles in cell growth and proliferation, aberrant function of Lyn is associated with various forms of cancer, including leukemia, breast cancer and melanoma. Here, we determine a 1.3 Å resolution crystal structure of the polyproline-binding SH3 regulatory domain of human Lyn kinase, which adopts a five-stranded β-barrel fold. Mapping of cancer-associated point mutations onto this structure reveals that these amino acid substitutions are distributed throughout the SH3 domain and may affect Lyn kinase function distinctly.
Collapse
Affiliation(s)
- Sandra Berndt
- Department of Pharmacology, Vanderbilt University, Nashville, TN, United States of America
| | - Vsevolod V. Gurevich
- Department of Pharmacology, Vanderbilt University, Nashville, TN, United States of America
| | - T. M. Iverson
- Department of Pharmacology, Vanderbilt University, Nashville, TN, United States of America
- Department of Biochemistry, Vanderbilt University, Nashville, TN, United States of America
- Vanderbilt Institute of Chemical Biology, Nashville, TN, United States of America
- Center for Structural Biology, Nashville, TN, United States of America
| |
Collapse
|
14
|
Investigation of somatic single nucleotide variations in human endogenous retrovirus elements and their potential association with cancer. PLoS One 2019; 14:e0213770. [PMID: 30934003 PMCID: PMC6443178 DOI: 10.1371/journal.pone.0213770] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Accepted: 02/28/2019] [Indexed: 11/19/2022] Open
Abstract
Human endogenous retroviruses (HERVs) have been investigated for potential links with human cancer. However, the distribution of somatic nucleotide variations in HERV elements has not been explored in detail. This study aims to identify HERV elements with an over-representation of somatic mutations (hot spots) in cancer patients. Four HERV elements with mutation hotspots were identified that overlap with exons of four human protein coding genes. These hotspots were identified based on the significant over-representation (p<8.62e-4) of non-synonymous single-nucleotide variations (nsSNVs). These genes are TNN (HERV-9/LTR12), OR4K15 (HERV-IP10F/LTR10F), ZNF99 (HERV-W/HERV17/LTR17), and KIR2DL1 (MST/MaLR). In an effort to identify mutations that effect survival, all nsSNVs were further evaluated and it was found that kidney cancer patients with mutation C2270G in ZNF99 have a significantly lower survival rate (hazard ratio = 2.6) compared to those without it. Among HERV elements in the human non-protein coding regions, we found 788 HERVs with significantly elevated numbers of somatic single-nucleotide variations (SNVs) (p<1.60e-5). From this category the top three HERV elements with significantly over-represented SNVs are HERV-H/LTR7, HERV-9/LTR12 and HERV-L/MLT2. Majority of the SNVs in these 788 HERV elements are located in three DNA functional groups: long non-coding RNAs (lncRNAs) (60%), introns (22.2%) and transcriptional factor binding sites (TFBS) (14.8%). This study provides a list of mutational hotspots in HERVs, which could potentially be used as biomarkers and therapeutic targets.
Collapse
|
15
|
WITHDRAWN: A novel insight of Asp193His mutation on epigenetic methyltransferase activity of human EZH2 protein: An in-silico approach. INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
16
|
Gautam N, Kaur S, Kaur K, Kumar N. A novel insight of Asp193His mutation on epigenetic methyltransferase activity of human EZH2 protein: An in-silico approach. Meta Gene 2019. [DOI: 10.1016/j.mgene.2019.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
|
17
|
Rambold G, Yilmaz P, Harjes J, Klaster S, Sanz V, Link A, Glöckner FO, Triebel D. Meta-omics data and collection objects (MOD-CO): a conceptual schema and data model for processing sample data in meta-omics research. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5303972. [PMID: 30715273 PMCID: PMC6354027 DOI: 10.1093/database/baz002] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 01/07/2019] [Indexed: 12/16/2022]
Abstract
With the advent of advanced molecular meta-omics techniques and methods, a new era commenced for analysing and characterizing historic collection specimens, as well as recently collected environmental samples. Nucleic acid and protein sequencing-based analyses are increasingly applied to determine the origin, identity and traits of environmental (biological) objects and organisms. In this context, the need for new data structures is evident and former approaches for data processing need to be expanded according to the new meta-omics techniques and operational standards. Existing schemas and community standards in the biodiversity and molecular domain concentrate on terms important for data exchange and publication. Detailed operational aspects of origin and laboratory as well as object and data management issues are frequently neglected. Meta-omics Data and Collection Objects (MOD-CO) has therefore been set up as a new schema for meta-omics research, with a hierarchical organization of the concepts describing collection samples, as well as products and data objects being generated during operational workflows. It is focussed on object trait descriptions as well as on operational aspects and thereby may serve as a backbone for R&D laboratory information management systems with functions of an electronic laboratory notebook. The schema in its current version 1.0 includes 653 concepts and 1810 predefined concept values, being equivalent to descriptors and descriptor states, respectively. It is published in several representations, like a Semantic Media Wiki publication with 2463 interlinked Wiki pages for concepts and concept values, being grouped in 37 concept collections and subcollections. The SQL database application DiversityDescriptions, a generic tool for maintaining descriptive data and schemas, has been applied for setting up and testing MOD-CO and for concept mapping on elements of corresponding schemas.
Collapse
Affiliation(s)
- Gerhard Rambold
- University of Bayreuth, Universitätsstraße 30, Bayreuth, Germany
| | - Pelin Yilmaz
- Max Planck Institute for Marine Microbiology, Celsiusstraße 1, Bremen, Germany
| | - Janno Harjes
- University of Bayreuth, Universitätsstraße 30, Bayreuth, Germany
| | - Sabrina Klaster
- University of Bayreuth, Universitätsstraße 30, Bayreuth, Germany
| | - Veronica Sanz
- University of Bayreuth, Universitätsstraße 30, Bayreuth, Germany.,SNSB IT Center, Menzinger Straße 67, München, Germany
| | - Anton Link
- SNSB IT Center, Menzinger Straße 67, München, Germany
| | - Frank Oliver Glöckner
- Max Planck Institute for Marine Microbiology, Celsiusstraße 1, Bremen, Germany.,Jacobs University, Campus Ring 1, Bremen, Germany
| | | |
Collapse
|
18
|
Rashid MI, Ali A, Andleeb S. Functional Annotation and Analysis of Dual Oxidase 1 (DUOX1): a Potential Anti-pyocyanin Immune Component. Interdiscip Sci 2018; 11:597-610. [PMID: 30483939 DOI: 10.1007/s12539-018-0308-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2018] [Revised: 10/24/2018] [Accepted: 10/24/2018] [Indexed: 11/27/2022]
Abstract
Dual Oxidase 1 (DUOX1) is a prominent immune system component primarily expressed in esophagus, lungs, skin, and urinary bladder including others. DUOX1 is involved in lactoperoxidase-mediated innate immunity at mucosal surfaces by generation of antimicrobial hypothiocyanite at the apical surface of epithelial lining. Upon detection of bacterial pathogens mainly Pseudomonas aeruginosa, DUOX1 is activated in bronchial epithelial cells. Both the host and pathogen enter a redox dual with DUOX1 and hypothiocyanite from host and Pyocyanin (PCN) as a redox active virulence factor from P. aeruginosa. The synergy of the both enzymes permanently oxidizes PCN and thus holds the potential to prevent PCN-induced virulence, which otherwise paves the way for establishment of persistent chronic infection. In this study, we structurally and functionally annotated the DUOX1, predicted its 3d structure, physio-chemical properties, post-translational modifications, and genetic polymorphism analysis with subsequent disease-associated single-nucleotide variations and their impact on DUOX1 functionality by employing in silico approaches. DUOX1 holds greater homology with gorilla and chimpanzee than other primates. The localization signal peptide was present at the beginning of the peptide with cleavage site at 22 aa position. Three distinct functional domains were observed based on homology: An_peroxidase, FRQ1, and oxido-reductase domains. Polymorphism analysis revealed > 60 SNPs associated with different cancers with probable damaging effects. No cancer-associated methylated island was observed for DUOX1. Three-dimensional structure was developed via homology modeling strategy. The proper annotation will help in characterization of DUOX1 and enhance our knowledge of its functionality and biological roles.
Collapse
Affiliation(s)
- Muhammad Ibrahim Rashid
- Department of Industrial Biotechnology, Atta ur Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Amjad Ali
- Department of Industrial Biotechnology, Atta ur Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Saadia Andleeb
- Department of Industrial Biotechnology, Atta ur Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan.
| |
Collapse
|
19
|
Kannan S, Tan DSW, Verma CS. Effects of Single Nucleotide Polymorphisms on the Binding of Afatinib to EGFR: A Potential Patient Stratification Factor Revealed by Modeling Studies. J Chem Inf Model 2018; 59:309-315. [DOI: 10.1021/acs.jcim.8b00491] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Srinivasaraghavan Kannan
- Bioinformatics Institute (A*STAR), 30 Biopolis Street, 07-01 matrix, Singapore 138671, Singapore
| | - Daniel Shao-Weng Tan
- Division of Medical Oncology, National Cancer Centre Singapore, 11 Hospital Drive, Singapore 169610, Singapore
- Cancer Therapeutics Research Laboratory, National Cancer Centre Singapore, 11 Hospital Drive, Singapore 169610, Singapore
- Cancer Stem Cell Biology, Genome Institute of Singapore, 60 Biopolis Street, 02-01, Singapore 138672, Singapore
| | - Chandra Shekhar Verma
- Bioinformatics Institute (A*STAR), 30 Biopolis Street, 07-01 matrix, Singapore 138671, Singapore
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
- Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore 117543, Singapore
| |
Collapse
|
20
|
Szabó B, Murvai N, Abukhairan R, Schád É, Kardos J, Szeder B, Buday L, Tantos Á. Disordered Regions of Mixed Lineage Leukemia 4 (MLL4) Protein Are Capable of RNA Binding. Int J Mol Sci 2018; 19:ijms19113478. [PMID: 30400675 PMCID: PMC6274713 DOI: 10.3390/ijms19113478] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Revised: 11/01/2018] [Accepted: 11/02/2018] [Indexed: 01/11/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are emerging as important regulators of cellular processes and are extensively involved in the development of different cancers; including leukemias. As one of the accepted methods of lncRNA function is affecting chromatin structure; lncRNA binding has been shown for different chromatin modifiers. Histone lysine methyltransferases (HKMTs) are also subject of lncRNA regulation as demonstrated for example in the case of Polycomb Repressive Complex 2 (PRC2). Mixed Lineage Leukemia (MLL) proteins that catalyze the methylation of H3K4 have been implicated in several different cancers; yet many details of their regulation and targeting remain elusive. In this work we explored the RNA binding capability of two; so far uncharacterized regions of MLL4; with the aim of shedding light to the existence of possible regulatory lncRNA interactions of the protein. We demonstrated that both regions; one that contains a predicted RNA binding sequence and one that does not; are capable of binding to different RNA constructs in vitro. To our knowledge, these findings are the first to indicate that an MLL protein itself is capable of lncRNA binding.
Collapse
Affiliation(s)
- Beáta Szabó
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary.
| | - Nikoletta Murvai
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary.
| | - Rawan Abukhairan
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary.
| | - Éva Schád
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary.
| | - József Kardos
- ELTE NAP Neuroimmunology Research Group, Department of Biochemistry, Eötvös Loránd University, H-1117 Budapest, Hungary.
| | - Bálint Szeder
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary.
| | - László Buday
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary.
| | - Ágnes Tantos
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary.
| |
Collapse
|
21
|
Kordopati V, Salhi A, Razali R, Radovanovic A, Tifratene F, Uludag M, Li Y, Bokhari A, AlSaieedi A, Bin Raies A, Van Neste C, Essack M, Bajic VB. DES-Mutation: System for Exploring Links of Mutations and Diseases. Sci Rep 2018; 8:13359. [PMID: 30190574 PMCID: PMC6127254 DOI: 10.1038/s41598-018-31439-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Accepted: 08/17/2018] [Indexed: 12/17/2022] Open
Abstract
During cellular division DNA replicates and this process is the basis for passing genetic information to the next generation. However, the DNA copy process sometimes produces a copy that is not perfect, that is, one with mutations. The collection of all such mutations in the DNA copy of an organism makes it unique and determines the organism’s phenotype. However, mutations are often the cause of diseases. Thus, it is useful to have the capability to explore links between mutations and disease. We approached this problem by analyzing a vast amount of published information linking mutations to disease states. Based on such information, we developed the DES-Mutation knowledgebase which allows for exploration of not only mutation-disease links, but also links between mutations and concepts from 27 topic-specific dictionaries such as human genes/proteins, toxins, pathogens, etc. This allows for a more detailed insight into mutation-disease links and context. On a sample of 600 mutation-disease associations predicted and curated, our system achieves precision of 72.83%. To demonstrate the utility of DES-Mutation, we provide case studies related to known or potentially novel information involving disease mutations. To our knowledge, this is the first mutation-disease knowledgebase dedicated to the exploration of this topic through text-mining and data-mining of different mutation types and their associations with terms from multiple thematic dictionaries.
Collapse
Affiliation(s)
- Vasiliki Kordopati
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Adil Salhi
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Rozaimi Razali
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Aleksandar Radovanovic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Faroug Tifratene
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Mahmut Uludag
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Yu Li
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Ameerah Bokhari
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Ahdab AlSaieedi
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia.,King Abdulaziz University (KAU), Faculty of Applied Medical Sciences (FAMS), Department of Medical Laboratory Technology (MLT), Jeddah, 21589-80324, Saudi Arabia
| | - Arwa Bin Raies
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Christophe Van Neste
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia.,Ghent University, Center for Medical Genetics Ghent (CMGG), B-9000, Ghent, Belgium
| | - Magbubah Essack
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
22
|
Abstract
Despite availability of sequence site-specific information resulting from years of sequencing and sequence feature curation, there have been few efforts to integrate and annotate this information. In this study, we update the number of human N-linked glycosylation sequons (NLGs), and we investigate cancer-relatedness of glycosylation-impacting somatic nonsynonymous single-nucleotide variation (nsSNV) by mapping human NLGs to cancer variation data and reporting the expected loss or gain of glycosylation sequon. We find 75.8% of all human proteins have at least one NLG for a total of 59,341 unique NLGs (includes predicted and experimentally validated). Only 27.4% of all NLGs are experimentally validated sites on 4,412 glycoproteins. With respect to cancer, 8,895 somatic-only nsSNVs abolish NLGs in 5,204 proteins and 12,939 somatic-only nsSNVs create NLGs in 7,356 proteins in cancer samples. nsSNVs causing loss of 24 NLGs on 23 glycoproteins and nsSNVs creating 41 NLGs on 40 glycoproteins are identified in three or more cancers. Of all identified cancer somatic variants causing potential loss or gain of glycosylation, only 36 have previously known disease associations. Although this work is computational, it builds on existing genomics and glycobiology research to promote identification and rank potential cancer nsSNV biomarkers for experimental validation.
Collapse
|
23
|
Dingerdissen HM, Torcivia-Rodriguez J, Hu Y, Chang TC, Mazumder R, Kahsay R. BioMuta and BioXpress: mutation and expression knowledgebases for cancer biomarker discovery. Nucleic Acids Res 2018; 46:D1128-D1136. [PMID: 30053270 PMCID: PMC5753215 DOI: 10.1093/nar/gkx907] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Revised: 09/21/2017] [Accepted: 09/26/2017] [Indexed: 12/29/2022] Open
Abstract
Single-nucleotide variation and gene expression of disease samples represent important resources for biomarker discovery. Many databases have been built to host and make available such data to the community, but these databases are frequently limited in scope and/or content. BioMuta, a database of cancer-associated single-nucleotide variations, and BioXpress, a database of cancer-associated differentially expressed genes and microRNAs, differ from other disease-associated variation and expression databases primarily through the aggregation of data across many studies into a single source with a unified representation and annotation of functional attributes. Early versions of these resources were initiated by pilot funding for specific research applications, but newly awarded funds have enabled hardening of these databases to production-level quality and will allow for sustained development of these resources for the next few years. Because both resources were developed using a similar methodology of integration, curation, unification, and annotation, we present BioMuta and BioXpress as allied databases that will facilitate a more comprehensive view of gene associations in cancer. BioMuta and BioXpress are hosted on the High-performance Integrated Virtual Environment (HIVE) server at the George Washington University at https://hive.biochemistry.gwu.edu/biomuta and https://hive.biochemistry.gwu.edu/bioxpress, respectively.
Collapse
Affiliation(s)
- Hayley M Dingerdissen
- The Department of Biochemistry & Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, USA
| | - John Torcivia-Rodriguez
- The Department of Biochemistry & Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, USA
| | - Yu Hu
- The Department of Biochemistry & Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, USA
| | - Ting-Chia Chang
- The Department of Biochemistry & Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, USA
| | - Raja Mazumder
- The Department of Biochemistry & Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, USA
- McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, USA
| | - Robel Kahsay
- The Department of Biochemistry & Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, USA
| |
Collapse
|
24
|
Mahmood ASMA, Rao S, McGarvey P, Wu C, Madhavan S, Vijay-Shanker K. eGARD: Extracting associations between genomic anomalies and drug responses from text. PLoS One 2017; 12:e0189663. [PMID: 29261751 PMCID: PMC5738129 DOI: 10.1371/journal.pone.0189663] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 11/29/2017] [Indexed: 12/25/2022] Open
Abstract
Tumor molecular profiling plays an integral role in identifying genomic anomalies which may help in personalizing cancer treatments, improving patient outcomes and minimizing risks associated with different therapies. However, critical information regarding the evidence of clinical utility of such anomalies is largely buried in biomedical literature. It is becoming prohibitive for biocurators, clinical researchers and oncologists to keep up with the rapidly growing volume and breadth of information, especially those that describe therapeutic implications of biomarkers and therefore relevant for treatment selection. In an effort to improve and speed up the process of manually reviewing and extracting relevant information from literature, we have developed a natural language processing (NLP)-based text mining (TM) system called eGARD (extracting Genomic Anomalies association with Response to Drugs). This system relies on the syntactic nature of sentences coupled with various textual features to extract relations between genomic anomalies and drug response from MEDLINE abstracts. Our system achieved high precision, recall and F-measure of up to 0.95, 0.86 and 0.90, respectively, on annotated evaluation datasets created in-house and obtained externally from PharmGKB. Additionally, the system extracted information that helps determine the confidence level of extraction to support prioritization of curation. Such a system will enable clinical researchers to explore the use of published markers to stratify patients upfront for 'best-fit' therapies and readily generate hypotheses for new clinical trials.
Collapse
Affiliation(s)
- A. S. M. Ashique Mahmood
- Department of Computer and Information Science, University of Delaware, Newark, Delaware, United States of America
- * E-mail:
| | - Shruti Rao
- Innovation Center For Biomedical Informatics, Georgetown University, Washington D.C, United States of America
| | - Peter McGarvey
- Innovation Center For Biomedical Informatics, Georgetown University, Washington D.C, United States of America
- Protein Information Resource, Georgetown University Medical Center, Washington D.C, United States of America
| | - Cathy Wu
- Department of Computer and Information Science, University of Delaware, Newark, Delaware, United States of America
- Protein Information Resource, Georgetown University Medical Center, Washington D.C, United States of America
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, Delaware, United States of America
| | - Subha Madhavan
- Innovation Center For Biomedical Informatics, Georgetown University, Washington D.C, United States of America
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington D.C, United States of America
| | - K. Vijay-Shanker
- Department of Computer and Information Science, University of Delaware, Newark, Delaware, United States of America
| |
Collapse
|
25
|
3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures. Molecules 2017; 22:molecules22122230. [PMID: 29244774 PMCID: PMC6149929 DOI: 10.3390/molecules22122230] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 12/11/2017] [Accepted: 12/13/2017] [Indexed: 11/16/2022] Open
Abstract
Many studies have used position-specific scoring matrices (PSSM) profiles to characterize residues in protein structures and to predict a broad range of protein features. Moreover, PSSM profiles of Protein Data Bank (PDB) entries have been recalculated in many works for different purposes. Although the computational cost of calculating a single PSSM profile is affordable, many statistical studies or machine learning-based methods used thousands of profiles to achieve their goals, thereby leading to a substantial increase of the computational cost. In this work we present a new database compiling PSSM profiles for the proteins of the PDB. Currently, the database contains 333,532 protein chain profiles involving 123,135 different PDB entries.
Collapse
|
26
|
Park J, Han D, Do M, Woo J, Wang JI, Han Y, Kwon W, Kim SW, Jang JY, Kim Y. Proteome characterization of human pancreatic cyst fluid from intraductal papillary mucinous neoplasm by liquid chromatography/tandem mass spectrometry. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2017; 31:1761-1772. [PMID: 28815810 DOI: 10.1002/rcm.7959] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 07/12/2017] [Accepted: 08/11/2017] [Indexed: 06/07/2023]
Abstract
RATIONALE In recent years, the molecular components of pancreatic cyst fluid have been used for diagnosis and prognosis. Because the protein markers that are currently used in clinical tests are unreliable, proteomic studies to find new protein markers are being conducted. However, such researches have been limited due to the complexity of pancreatic cyst fluid and the immaturity of proteomic techniques. METHODS To overcome these limitations and provide a pancreatic cyst proteome dataset, we examined cyst fluid proteome with tandem mass spectrometry. The proteomic analysis was performed using a Orbitrap-based mass spectrometer (Q-Exactive) coupled with a 50-cm-long nano-liquid chromatography column. Protein mutations were identified using mutation sequence database search. RESULTS A total of 5850 protein groups were identified from microliters of cyst fluid. Among those, 3934 protein groups were reported for the first time in pancreatic cyst fluid. Although high-abundance proteins were not depleted in the experiment, our dataset detected almost all pancreatic tumor markers such as mucin family members, S100 proteins, and CEA-related proteins. In addition, 590 protein mutation marker candidates were discovered. CONCLUSIONS We provide a comprehensive cyst proteome dataset that includes cystic cellular proteins and mutated proteins. Our findings would serve as a rich resource for further IPMN studies and clinical applications. The MS data have been deposited in the ProteomeXchange with identifier PXD005671 (http://proteomecentral.proteomexchange.org/dataset/PXD005671).
Collapse
MESH Headings
- Amino Acid Sequence
- Biomarkers, Tumor/analysis
- Carcinoma, Pancreatic Ductal/chemistry
- Carcinoma, Pancreatic Ductal/pathology
- Chromatography, Liquid/methods
- Cyst Fluid/chemistry
- Humans
- Neoplasms, Cystic, Mucinous, and Serous/chemistry
- Neoplasms, Cystic, Mucinous, and Serous/pathology
- Pancreas/chemistry
- Pancreas/pathology
- Pancreatic Cyst/chemistry
- Pancreatic Cyst/pathology
- Pancreatic Neoplasms/chemistry
- Pancreatic Neoplasms/pathology
- Proteome/analysis
- Proteomics/methods
- Tandem Mass Spectrometry/methods
Collapse
Affiliation(s)
- Joonho Park
- Department of Biomedical Engineering, Seoul National University College of Medicine, 103 Daehak-ro, Seoul, Korea
| | - Dohyun Han
- Biomedical Research Institute, Seoul National University Hospital, 101 Daehak-ro, Seoul, Korea
| | - Misol Do
- Department of Biomedical Sciences, Seoul National University College of Medicine, 103 Daehak-ro, Seoul, Korea
| | - Jongmin Woo
- Department of Biomedical Sciences, Seoul National University College of Medicine, 103 Daehak-ro, Seoul, Korea
| | - Joseph I Wang
- Department of Biomedical Engineering, Seoul National University College of Medicine, 103 Daehak-ro, Seoul, Korea
| | - Youngmin Han
- Department of Surgery, Seoul National University College of Medicine, 103 Daehak-ro, Seoul, Korea
| | - Wooil Kwon
- Department of Surgery, Seoul National University College of Medicine, 103 Daehak-ro, Seoul, Korea
| | - Sun-Whe Kim
- Department of Surgery, Seoul National University College of Medicine, 103 Daehak-ro, Seoul, Korea
| | - Jin-Young Jang
- Department of Surgery, Seoul National University College of Medicine, 103 Daehak-ro, Seoul, Korea
| | - Youngsoo Kim
- Department of Biomedical Engineering, Seoul National University College of Medicine, 103 Daehak-ro, Seoul, Korea
- Department of Biomedical Sciences, Seoul National University College of Medicine, 103 Daehak-ro, Seoul, Korea
| |
Collapse
|
27
|
Zhang J, Kinch LN, Cong Q, Weile J, Sun S, Cote AG, Roth FP, Grishin NV. Assessing predictions of fitness effects of missense mutations in SUMO-conjugating enzyme UBE2I. Hum Mutat 2017; 38:1051-1063. [PMID: 28817247 PMCID: PMC5746193 DOI: 10.1002/humu.23293] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Revised: 06/29/2017] [Accepted: 06/30/2017] [Indexed: 11/07/2022]
Abstract
The exponential growth of genomic variants uncovered by next-generation sequencing necessitates efficient and accurate computational analyses to predict their functional effects. A number of computational methods have been developed for the task, but few unbiased comparisons of their performance are available. To fill the gap, The Critical Assessment of Genome Interpretation (CAGI) comprehensively assesses phenotypic predictions on newly collected experimental datasets. Here, we present the results of the SUMO conjugase challenge where participants were predicting functional effects of missense mutations in human SUMO-conjugating enzyme UBE2I. The performance of the predictors is similar to each other and is far from perfection. Evolutionary information from sequence alignments dominates the success: deleterious mutations at conserved positions and benign mutations at variable positions are accurately predicted. Prediction accuracy of other mutations remains unsatisfactory, and this fast-growing field of research is yet to learn the use of spatial structure information to improve the predictions significantly.
Collapse
Affiliation(s)
- Jing Zhang
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-8816, USA
| | - Lisa N. Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-9050, USA
| | - Qian Cong
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-8816, USA
| | - Jochen Weile
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto Ontario M5G 1X5, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Song Sun
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto Ontario M5G 1X5, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Atina G Cote
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto Ontario M5G 1X5, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Frederick P. Roth
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto Ontario M5G 1X5, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Computer Science University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-9050, USA
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-8816, USA
| |
Collapse
|
28
|
Cancer-associated mutations in the canonical cleavage site do not influence CD99 shedding by the metalloprotease meprin β but alter cell migration in vitro. Oncotarget 2017; 8:54873-54888. [PMID: 28903388 PMCID: PMC5589627 DOI: 10.18632/oncotarget.18966] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 06/17/2017] [Indexed: 01/22/2023] Open
Abstract
Transendothelial cell migration (TEM) is crucial for inflammation and metastasis. The adhesion molecule CD99 was shown to be important for correct immune cell extravasation and is highly expressed on certain cancer cells. Recently, we demonstrated that ectodomain shedding of CD99 by the metalloprotease meprin β promotes TEM in vitro. In this study, we employed an acute inflammation model (air pouch/carrageenan) and found significantly less infiltrated cells in meprin β knock-out animals validating the previously observed pro-inflammatory activity. To further analyze the impact of meprin β on CD99 shedding with regard to cell adhesion and proliferation we characterized two lung cancer associated CD99 variants (D92H, D92Y), carrying point mutations at the main cleavage site. Interestingly, ectodomain shedding of these variants by meprin β was still detectable. However the cleavage site shifted to adjacent positions. Nevertheless, expression of CD99 variants D92H and D92Y revealed partial misfolding and proteasomal degradation. A previously observed influence of CD99 on Src activation and increased proliferation could not be confirmed in this study, independent of wild-type CD99 or the variants D92H and D92Y. However, we identified meprin β as a potent inducer of Src phosphorylation. Importantly, we found significantly increased cell migration when expressing the cancer-associated CD99 variant D92H compared to the wild-type protein.
Collapse
|
29
|
A review on mass spectrometry-based quantitative proteomics: Targeted and data independent acquisition. Anal Chim Acta 2017; 964:7-23. [DOI: 10.1016/j.aca.2017.01.059] [Citation(s) in RCA: 248] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Revised: 01/03/2017] [Accepted: 01/05/2017] [Indexed: 01/18/2023]
|
30
|
Goldweber S, Theodore J, Torcivia-Rodriguez J, Simonyan V, Mazumder R. Pubcast and Genecast: Browsing and Exploring Publications and Associated Curated Content in Biology Through Mobile Devices. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:498-500. [PMID: 28113865 DOI: 10.1109/tcbb.2016.2542802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
UNLABELLED Services such as Facebook, Amazon, and eBay were once solely accessed from stationary computers. These web services are now being used increasingly on mobile devices. We acknowledge this new reality by providing users a way to access publications and a curated cancer mutation database on their mobile device with daily automated updates. AVAILABILITY http://hive. biochemistry.gwu.edu/tools/HivePubcast.
Collapse
|
31
|
Pan Y, Yan C, Hu Y, Fan Y, Pan Q, Wan Q, Torcivia-Rodriguez J, Mazumder R. Distribution bias analysis of germline and somatic single-nucleotide variations that impact protein functional site and neighboring amino acids. Sci Rep 2017; 7:42169. [PMID: 28176830 PMCID: PMC5296879 DOI: 10.1038/srep42169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Accepted: 01/05/2017] [Indexed: 01/13/2023] Open
Abstract
Single nucleotide variations (SNVs) can result in loss or gain of protein functional sites. We analyzed the effects of SNVs on enzyme active sites, ligand binding sites, and various types of post translational modification (PTM) sites. We found that, for most types of protein functional sites, the SNV pattern differs between germline and somatic mutations as well as between synonymous and non-synonymous mutations. From a total of 51,138 protein functional site affecting SNVs (pfsSNVs), a pan-cancer analysis revealed 142 somatic pfsSNVs in five or more cancer types. By leveraging patient information for somatic pfsSNVs, we identified 17 loss of functional site SNVs and 60 gain of functional site SNVs which are significantly enriched in patients with specific cancer types. Of the key pfsSNVs identified in our analysis above, we highlight 132 key pfsSNVs within 17 genes that are found in well-established cancer associated gene lists. For illustrating how key pfsSNVs can be prioritized further, we provide a use case where we performed survival analysis showing that a loss of phosphorylation site pfsSNV at position 105 in MEF2A is significantly associated with decreased pancreatic cancer patient survival rate. These 132 pfsSNVs can be used in developing genetic testing pipelines.
Collapse
Affiliation(s)
- Yang Pan
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - Cheng Yan
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - Yu Hu
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - Yu Fan
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - Qing Pan
- The Department of Statistics, The George Washington University, Washington, DC 20037, United States of America
| | - Quan Wan
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - John Torcivia-Rodriguez
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America
| | - Raja Mazumder
- The Department of Biochemistry &Molecular Medicine, The George Washington University Medical Center, Washington, DC 20037, United States of America.,McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, United States of America
| |
Collapse
|
32
|
Impact of Nonsynonymous Single-Nucleotide Variations on Post-Translational Modification Sites in Human Proteins. Methods Mol Biol 2017. [PMID: 28150238 DOI: 10.1007/978-1-4939-6783-4_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
Post-translational modifications (PTMs) are covalent modifications that proteins might undergo following or sometimes during the process of translation. Together with gene diversity, PTMs contribute to the overall variety of possible protein function for a given organism. Single-nucleotide polymorphisms (SNPs) are the most common form of variations found in the human genome, and have been found to be associated with diseases like Alzheimer's disease (AD) and Parkinson's disease (PD), among many others. Studies have also shown that non-synonymous single-nucleotide variation (nsSNV) at the PTM site, which alters the corresponding encoded amino acid in the translated protein sequence, can lead to abnormal activity of a protein and can contribute to a disease phenotype. Significant advances in next-generation sequencing (NGS) technologies and high-throughput proteomics have resulted in the generation of a huge amount of data for both SNPs and PTMs. However, these data are unsystematically distributed across a number of diverse databases. Thus, there is a need for efforts toward data standardization and validation of bioinformatics algorithms that can fully leverage SNP and PTM information for biomedical research. In this book chapter, we will present some of the commonly used databases for both SNVs and PTMs and describe a broad approach that can be applied to many scenarios for studying the impact of nsSNVs on PTM sites of human proteins.
Collapse
|
33
|
Abstract
The main databases devoted stricto sensu to cancer cytogenetics are the "Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer" ( http://cgap.nci.nih.gov/Chromosomes/Mitelman ), the "Atlas of Genetics and Cytogenetics in Oncology and Haematology" ( http://atlasgeneticsoncology.org ), and COSMIC ( http://cancer.sanger.ac.uk/cosmic ).However, being a complex multistep process, cancer cytogenetics are broadened to "cytogenomics," with complementary resources on: general databases (nucleic acid and protein sequences databases; cartography browsers: GenBank, RefSeq, UCSC, Ensembl, UniProtKB, and Entrez Gene), cancer genomic portals associated with recent international integrated programs, such as TCGA or ICGC, other fusion genes databases, array CGH databases, copy number variation databases, and mutation databases. Other resources such as the International System for Human Cytogenomic Nomenclature (ISCN), the International Classification of Diseases for Oncology (ICD-O), and the Human Gene Nomenclature Database (HGNC) allow a common language.Data within the scientific/medical community should be freely available. However, most of the institutional stakeholders are now gradually disengaging, and well-known databases are forced to beg or to disappear (which may happen!).
Collapse
|
34
|
Abstract
Many publicly available data repositories and resources have been developed to support protein-related information management, data-driven hypothesis generation, and biological knowledge discovery. To help researchers quickly find the appropriate protein-related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era.
Collapse
Affiliation(s)
- Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA.
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
- Protein Information Resource, Department of Biochemistry and Molecular and Cellular Biology, Georgetown University Medical Center, Washington, DC, 20007, USA
| |
Collapse
|
35
|
Abstract
Chemical tools have accelerated progress in glycoscience, reducing experimental barriers to studying protein glycosylation, the most widespread and complex form of posttranslational modification. For example, chemical glycoproteomics technologies have enabled the identification of specific glycosylation sites and glycan structures that modulate protein function in a number of biological processes. This field is now entering a stage of logarithmic growth, during which chemical innovations combined with mass spectrometry advances could make it possible to fully characterize the human glycoproteome. In this review, we describe the important role that chemical glycoproteomics methods are playing in such efforts. We summarize developments in four key areas: enrichment of glycoproteins and glycopeptides from complex mixtures, emphasizing methods that exploit unique chemical properties of glycans or introduce unnatural functional groups through metabolic labeling and chemoenzymatic tagging; identification of sites of protein glycosylation; targeted glycoproteomics; and functional glycoproteomics, with a focus on probing interactions between glycoproteins and glycan-binding proteins. Our goal with this survey is to provide a foundation on which continued technological advancements can be made to promote further explorations of protein glycosylation.
Collapse
Affiliation(s)
- Krishnan K. Palaniappan
- Verily Life Sciences, 269 East Grand Ave., South San Francisco, California 94080, United States
| | - Carolyn R. Bertozzi
- Department of Chemistry, Stanford University, Stanford, California 94305, United States
- Howard Hughes Medical Institute, Stanford University, Stanford, California 94305, United States
| |
Collapse
|
36
|
Zhao B, Xue B. Self-regulation of functional pathways by motifs inside the disordered tails of beta-catenin. BMC Genomics 2016; 17 Suppl 5:484. [PMID: 27585692 PMCID: PMC5009561 DOI: 10.1186/s12864-016-2825-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Beta-catenin has two major functions: coordinating cell-cell adhesion by interacting with cadherin in cadherin junction formation pathway; and regulating gene expression through Wnt signaling pathway. Accomplishing these two functions requires synergistic action of various sequential regions of the same beta-Catenin molecule, including the N-terminal tail, the middle armadillo domain, and the C-terminal tail. Although the middle armadillo domain is the major functional unit of beta-Catenin, the involvement of tails in the regulation of interaction between beta-Catenin and its partners has been well observed. Nonetheless, the regulatory processes of both tails are still elusive. In addition, it is interesting to note that the three sequential regions have different structural features: The middle armadillo domain is structured, but both N- and C-terminal tails are disordered. This observation leads to another important question on the functions and mechanisms of disordered tails, which is also largely unknown. RESULTS In this study, we focused on the characterization of sequential, structural, and functional features of the disordered tails of beta-Catenin. We identified multiple functional motifs and conserved sequence motifs in the disordered tails, discovered the correlation between cancer-associated mutations and functional motifs, explored the abundance of protein intrinsic disorder in the interactomes of beta-Catenin, and elaborated a working model on the regulatory roles of disordered tails in the functional pathways of beta-Catenin. CONCLUSION Disordered tails of beta-Catenin contain multiple functional motifs. These motifs interact with each other and the armadillo domain of beta-catenin to regulate the function of beta-Catenin in both cadherin junction formation pathway and Wnt signaling pathway.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, 4202 E. Fowler Ave, ISA 2015, Tampa, 33620 FL USA
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, 4202 E. Fowler Ave, ISA 2015, Tampa, 33620 FL USA
| |
Collapse
|
37
|
Lazar T, Schad E, Szabo B, Horvath T, Meszaros A, Tompa P, Tantos A. Intrinsic protein disorder in histone lysine methylation. Biol Direct 2016; 11:30. [PMID: 27356874 PMCID: PMC4928265 DOI: 10.1186/s13062-016-0129-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 05/17/2016] [Indexed: 11/21/2022] Open
Abstract
Histone lysine methyltransferases (HKMTs), catalyze mono-, di- and trimethylation of lysine residues, resulting in a regulatory pattern that controls gene expression. Their involvement in many different cellular processes and diseases makes HKMTs an intensively studied protein group, but scientific interest so far has been concentrated mostly on their catalytic domains. In this work we set out to analyze the structural heterogeneity of human HKMTs and found that many contain long intrinsically disordered regions (IDRs) that are conserved through vertebrate species. Our predictions show that these IDRs contain several linear motifs and conserved putative binding sites that harbor cancer-related SNPs. Although there are only limited data available in the literature, some of the predicted binding regions overlap with interacting segments identified experimentally. The importance of a disordered binding site is illustrated through the example of the ternary complex between MLL1, menin and LEDGF/p75. Our suggestion is that intrinsic protein disorder plays an as yet unrecognized role in epigenetic regulation, which needs to be further elucidated through structural and functional studies aimed specifically at the disordered regions of HKMTs. Reviewers: This article was reviewed by Arne Elofsson and Piotr Zielenkiewicz.
Collapse
Affiliation(s)
- Tamas Lazar
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok körútja 2, 1117, Budapest, Hungary.,Pázmány Péter Catholic University, Faculty of Information Technology and Bionics, Práter utca 50/a, 1083, Budapest, Hungary
| | - Eva Schad
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok körútja 2, 1117, Budapest, Hungary
| | - Beata Szabo
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok körútja 2, 1117, Budapest, Hungary
| | - Tamas Horvath
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok körútja 2, 1117, Budapest, Hungary
| | - Attila Meszaros
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok körútja 2, 1117, Budapest, Hungary
| | - Peter Tompa
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok körútja 2, 1117, Budapest, Hungary.,VIB Structural Biology Research Center (SBRC), Pleinlaan 2, 1050, Brussels, Belgium.,Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
| | - Agnes Tantos
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok körútja 2, 1117, Budapest, Hungary.
| |
Collapse
|
38
|
Huwe PJ, Xu Q, Shapovalov MV, Modi V, Andrake MD, Dunbrack RL. Biological function derived from predicted structures in CASP11. Proteins 2016; 84 Suppl 1:370-91. [PMID: 27181425 DOI: 10.1002/prot.24997] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Revised: 01/10/2016] [Accepted: 01/18/2016] [Indexed: 12/26/2022]
Abstract
In CASP11, the organizers sought to bring the biological inferences from predicted structures to the fore. To accomplish this, we assessed the models for their ability to perform quantifiable tasks related to biological function. First, for 10 targets that were probable homodimers, we measured the accuracy of docking the models into homodimers as a function of GDT-TS of the monomers, which produced characteristic L-shaped plots. At low GDT-TS, none of the models could be docked correctly as homodimers. Above GDT-TS of ∼60%, some models formed correct homodimers in one of the largest docked clusters, while many other models at the same values of GDT-TS did not. Docking was more successful when many of the templates shared the same homodimer. Second, we docked a ligand from an experimental structure into each of the models of one of the targets. Docking to the models with two different programs produced poor ligand RMSDs with the experimental structure. Measures that evaluated similarity of contacts were reasonable for some of the models, although there was not a significant correlation with model accuracy. Finally, we assessed whether models would be useful in predicting the phenotypes of missense mutations in three human targets by comparing features calculated from the models with those calculated from the experimental structures. The models were successful in reproducing accessible surface areas but there was little correlation of model accuracy with calculation of FoldX evaluation of the change in free energy between the wild-type and the mutant. Proteins 2016; 84(Suppl 1):370-391. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Peter J Huwe
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | - Qifang Xu
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | | | - Vivek Modi
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | - Mark D Andrake
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | | |
Collapse
|
39
|
Mahmood ASMA, Wu TJ, Mazumder R, Vijay-Shanker K. DiMeX: A Text Mining System for Mutation-Disease Association Extraction. PLoS One 2016; 11:e0152725. [PMID: 27073839 PMCID: PMC4830514 DOI: 10.1371/journal.pone.0152725] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 03/19/2016] [Indexed: 11/22/2022] Open
Abstract
The number of published articles describing associations between mutations and diseases is increasing at a fast pace. There is a pressing need to gather such mutation-disease associations into public knowledge bases, but manual curation slows down the growth of such databases. We have addressed this problem by developing a text-mining system (DiMeX) to extract mutation to disease associations from publication abstracts. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX achieves high precision and recall with F-scores of 0.88, 0.91 and 0.89 when evaluated on three different datasets for mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. This component has been also evaluated on different datasets and shown to achieve state-of-the-art performance. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. The results are stored in a database that can be queried and downloaded at http://biotm.cis.udel.edu/dimex/. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.
Collapse
Affiliation(s)
- A. S. M. Ashique Mahmood
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
- * E-mail:
| | - Tsung-Jung Wu
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, District of Columbia, United States of America
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, District of Columbia, United States of America
- McCormick Genomic and Proteomic Center, George Washington University, Washington, District of Columbia, United States of America
| | - K. Vijay-Shanker
- Department of Computer and Information Sciences, University of Delaware, Newark, Delaware, United States of America
| |
Collapse
|
40
|
3DBIONOTES: A unified, enriched and interactive view of macromolecular information. J Struct Biol 2016; 194:231-4. [PMID: 26873783 DOI: 10.1016/j.jsb.2016.02.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 02/02/2016] [Accepted: 02/05/2016] [Indexed: 11/24/2022]
Abstract
With the advent of high throughput techniques like Next Generation Sequencing, the amount of biological information for genes and proteins is growing faster than ever. Structural information is also rapidly growing, especially in the cryo Electron Microscopy area. However, in many cases, the proteomic and genomic data are spread in multiple databases and with no simple connection to structural information. In this work we present a new web platform that integrates EMDB/PDB structures and UniProt sequences with different sources of protein annotations. The application provides an interactive interface linking sequence and structure, including EM maps, presenting the different sources of information at sequence and structural level. The web application is available at http://3dbionotes.cnb.csic.es.
Collapse
|
41
|
Impact of germline and somatic missense variations on drug binding sites. THE PHARMACOGENOMICS JOURNAL 2016; 17:128-136. [PMID: 26810135 PMCID: PMC5380835 DOI: 10.1038/tpj.2015.97] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Revised: 11/02/2015] [Accepted: 11/13/2015] [Indexed: 11/10/2022]
Abstract
Advancements in next-generation sequencing (NGS) technologies are generating a vast amount of data. This exacerbates the current challenge of translating NGS data into actionable clinical interpretations. We have comprehensively combined germline and somatic nonsynonymous single-nucleotide variations (nsSNVs) that affect drug binding sites in order to investigate their prevalence. The integrated data thus generated in conjunction with exome or whole-genome sequencing can be used to identify patients who may not respond to a specific drug because of alterations in drug binding efficacy due to nsSNVs in the target protein's gene. To identify the nsSNVs that may affect drug binding, protein–drug complex structures were retrieved from Protein Data Bank (PDB) followed by identification of amino acids in the protein–drug binding sites using an occluded surface method. Then, the germline and somatic mutations were mapped to these amino acids to identify which of these alter protein–drug binding sites. Using this method we identified 12 993 amino acid–drug binding sites across 253 unique proteins bound to 235 unique drugs. The integration of amino acid–drug binding sites data with both germline and somatic nsSNVs data sets revealed 3133 nsSNVs affecting amino acid–drug binding sites. In addition, a comprehensive drug target discovery was conducted based on protein structure similarity and conservation of amino acid–drug binding sites. Using this method, 81 paralogs were identified that could serve as alternative drug targets. In addition, non-human mammalian proteins bound to drugs were used to identify 142 homologs in humans that can potentially bind to drugs. In the current protein–drug pairs that contain somatic mutations within their binding site, we identified 85 proteins with significant differential gene expression changes associated with specific cancer types. Information on protein–drug binding predicted drug target proteins and prevalence of both somatic and germline nsSNVs that disrupt these binding sites can provide valuable knowledge for personalized medicine treatment. A web portal is available where nsSNVs from individual patient can be checked by scanning against DrugVar to determine whether any of the SNVs affect the binding of any drug in the database.
Collapse
|
42
|
Briata P, Bordo D, Puppo M, Gorlero F, Rossi M, Perrone-Bizzozero N, Gherzi R. Diverse roles of the nucleic acid-binding protein KHSRP in cell differentiation and disease. WILEY INTERDISCIPLINARY REVIEWS-RNA 2015; 7:227-40. [PMID: 26708421 DOI: 10.1002/wrna.1327] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Revised: 11/16/2015] [Accepted: 11/17/2015] [Indexed: 12/15/2022]
Abstract
The single-stranded nucleic acid-binding protein KHSRP (KH-type splicing regulatory protein) modulates RNA life and gene expression at various levels. KHSRP controls important cellular functions as different as proliferation, differentiation, metabolism, and response to infectious agents. We summarize and discuss experimental evidence providing a potential link between changes in KHSRP expression/function and human diseases including neuromuscular disorders, obesity, type II diabetes, and cancer.
Collapse
Affiliation(s)
- Paola Briata
- Gene Expression Regulation Laboratory, IRCCS AOU San Martino-IST, Genova, Italy
| | - Domenico Bordo
- Gene Expression Regulation Laboratory, IRCCS AOU San Martino-IST, Genova, Italy
| | - Margherita Puppo
- Gene Expression Regulation Laboratory, IRCCS AOU San Martino-IST, Genova, Italy
| | - Franco Gorlero
- S.C. Ginecologia e Ostetricia Galliera Hospital, Genova, Italy.,School of Medicine, DINOGMI, University of Genova, Genova, Italy
| | - Martina Rossi
- Gene Expression Regulation Laboratory, IRCCS AOU San Martino-IST, Genova, Italy
| | - Nora Perrone-Bizzozero
- Department of Neurosciences, School of Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Roberto Gherzi
- Gene Expression Regulation Laboratory, IRCCS AOU San Martino-IST, Genova, Italy
| |
Collapse
|
43
|
Mockus SM, Patterson SE, Statz C, Bult CJ, Tsongalis GJ. Clinical Trials in Precision Oncology. Clin Chem 2015; 62:442-8. [PMID: 26607725 DOI: 10.1373/clinchem.2015.247437] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 10/28/2015] [Indexed: 01/07/2023]
Abstract
BACKGROUND Availability of genomic information used in the management of cancer treatment has outpaced both regulatory and reimbursement efforts. Many types of clinical trials are underway to validate the utility of emerging genome-based biomarkers for diagnostic, prognostic, and predictive applications. Clinical trials are a key source of evidence required for US Food and Drug Administration approval of therapies and companion diagnostics and for establishing the acceptance criteria for reimbursement. CONTENT Determining the eligibility of patients for molecular-based clinical trials and the interpretation of data emerging from clinical trials is significantly hampered by 2 primary factors: the lack of specific reporting standards for biomarkers in clinical trials and the lack of adherence to official gene and variant naming standards. Clinical trial registries need specifics on the mutation required for enrollment as opposed to allowing a generic mutation entry such as, "EGFR mutation." The use of clinical trials data in bioinformatics analysis and reporting is also gated by the lack of robust, state of the art programmatic access support. An initiative is needed to develop community standards for clinical trial descriptions and outcome reporting that are modeled after similar efforts in the genomics research community. SUMMARY Systematic implementation of reporting standards is needed to insure consistency and specificity of biomarker data, which will in turn enable better comparison and assessment of clinical trial outcomes across multiple studies. Reporting standards will facilitate improved identification of relevant clinical trials, aggregation and comparison of information across independent trials, and programmatic access to clinical trials databases.
Collapse
Affiliation(s)
- Susan M Mockus
- The Jackson Laboratory for Genomic Medicine, Farmington, CT;
| | | | - Cara Statz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT
| | | | - Gregory J Tsongalis
- Dartmouth Hitchcock Medical Center and The Audrey and Theodor Geisel School of Medicine at Dartmouth, Lebanon, NH
| |
Collapse
|
44
|
Mitchell CS, Cates A, Kim RB, Hollinger SK. Undergraduate Biocuration: Developing Tomorrow's Researchers While Mining Today's Data. JOURNAL OF UNDERGRADUATE NEUROSCIENCE EDUCATION : JUNE : A PUBLICATION OF FUN, FACULTY FOR UNDERGRADUATE NEUROSCIENCE 2015; 14:A56-A65. [PMID: 26557796 PMCID: PMC4640483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Revised: 09/17/2015] [Accepted: 09/22/2015] [Indexed: 06/05/2023]
Abstract
Biocuration is a time-intensive process that involves extraction, transcription, and organization of biological or clinical data from disjointed data sets into a user-friendly database. Curated data is subsequently used primarily for text mining or informatics analysis (bioinformatics, neuroinformatics, health informatics, etc.) and secondarily as a researcher resource. Biocuration is traditionally considered a Ph.D. level task, but a massive shortage of curators to consolidate the ever-mounting biomedical "big data" opens the possibility of utilizing biocuration as a means to mine today's data while teaching students skill sets they can utilize in any career. By developing a biocuration assembly line of simplified and compartmentalized tasks, we have enabled biocuration to be effectively performed by a hierarchy of undergraduate students. We summarize the necessary physical resources, process for establishing a data path, biocuration workflow, and undergraduate hierarchy of curation, technical, information technology (IT), quality control and managerial positions. We detail the undergraduate application and training processes and give detailed job descriptions for each position on the assembly line. We present case studies of neuropathology curation performed entirely by undergraduates, namely the construction of experimental databases of Amyotrophic Lateral Sclerosis (ALS) transgenic mouse models and clinical data from ALS patient records. Our results reveal undergraduate biocuration is scalable for a group of 8-50+ with relatively minimal required resources. Moreover, with average accuracy rates greater than 98.8%, undergraduate biocurators are equivalently accurate to their professional counterparts. Initial training to be completely proficient at the entry-level takes about five weeks with a minimal student time commitment of four hours/week.
Collapse
Affiliation(s)
- Cassie S. Mitchell
- Address correspondence to: Dr. Cassie S. Mitchell, Biomedical Engineering, Georgia Insitute of Technology, 313 Ferst Drive, Atlanta, GA 30332.
| | | | | | | |
Collapse
|
45
|
Schriml LM, Mitraka E. The Disease Ontology: fostering interoperability between biological and clinical human disease-related data. Mamm Genome 2015; 26:584-9. [PMID: 26093607 PMCID: PMC4602048 DOI: 10.1007/s00335-015-9576-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Accepted: 06/08/2015] [Indexed: 12/15/2022]
Abstract
The Disease Ontology (DO) enables cross-domain data integration through a common standard of human disease terms and their etiological descriptions. Standardized disease descriptors that are integrated across mammalian genomic resources provide a human-readable, machine-interpretable, community-driven disease corpus that unifies the representation of human common and rare diseases. The DO is populated by consensus-driven disease data descriptors that incorporate disease terms utilized by genomic and genetic projects and resources engaged in studies to understand the genetics of human disease through the study of model organisms. The DO project serves multiple roles for the model organism community by providing: (1) a structured "backbone" of disease concepts represented among the model organism databases; (2) authoritative disease curation services to researchers and resource providers; and (3) development of subsets of the DO representative of human diseases annotated to animal models curated within the model organism databases.
Collapse
Affiliation(s)
- Lynn M Schriml
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
| | - Elvira Mitraka
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| |
Collapse
|
46
|
Wu TJ, Schriml LM, Chen QR, Colbert M, Crichton DJ, Finney R, Hu Y, Kibbe WA, Kincaid H, Meerzaman D, Mitraka E, Pan Y, Smith KM, Srivastava S, Ward S, Yan C, Mazumder R. Generating a focused view of disease ontology cancer terms for pan-cancer data integration and analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav032. [PMID: 25841438 PMCID: PMC4385274 DOI: 10.1093/database/bav032] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 03/13/2015] [Indexed: 01/01/2023]
Abstract
Bio-ontologies provide terminologies for the scientific community to describe biomedical entities in a standardized manner. There are multiple initiatives that are developing biomedical terminologies for the purpose of providing better annotation, data integration and mining capabilities. Terminology resources devised for multiple purposes inherently diverge in content and structure. A major issue of biomedical data integration is the development of overlapping terms, ambiguous classifications and inconsistencies represented across databases and publications. The disease ontology (DO) was developed over the past decade to address data integration, standardization and annotation issues for human disease data. We have established a DO cancer project to be a focused view of cancer terms within the DO. The DO cancer project mapped 386 cancer terms from the Catalogue of Somatic Mutations in Cancer (COSMIC), The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium, Therapeutically Applicable Research to Generate Effective Treatments, Integrative Oncogenomics and the Early Detection Research Network into a cohesive set of 187 DO terms represented by 63 top-level DO cancer terms. For example, the COSMIC term ‘kidney, NS, carcinoma, clear_cell_renal_cell_carcinoma’ and TCGA term ‘Kidney renal clear cell carcinoma’ were both grouped to the term ‘Disease Ontology Identification (DOID):4467 / renal clear cell carcinoma’ which was mapped to the TopNodes_DOcancerslim term ‘DOID:263 / kidney cancer’. Mapping of diverse cancer terms to DO and the use of top level terms (DO slims) will enable pan-cancer analysis across datasets generated from any of the cancer term sources where pan-cancer means including or relating to all or multiple types of cancer. The terms can be browsed from the DO web site (http://www.disease-ontology.org) and downloaded from the DO’s Apache Subversion or GitHub repositories. Database URL:http://www.disease-ontology.org
Collapse
Affiliation(s)
- Tsung-Jung Wu
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Lynn M Schriml
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Qing-Rong Chen
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Maureen Colbert
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Daniel J Crichton
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Richard Finney
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Ying Hu
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Warren A Kibbe
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Heather Kincaid
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Daoud Meerzaman
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Elvira Mitraka
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Yang Pan
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Krista M Smith
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Sudhir Srivastava
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Sari Ward
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Cheng Yan
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA, Center for Bioinformatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, NASA Jet Propulsion Laboratory, Pasadena, CA, USA, Division of Cancer Prevention, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20892-9760, USA, Wellcome Trust Sanger Institute, Cambridge, UK and McCormick Genomic and Proteomic Center, George Washington University, Washington, DC 20037, USA
| |
Collapse
|
47
|
Wan Q, Dingerdissen H, Fan Y, Gulzar N, Pan Y, Wu TJ, Yan C, Zhang H, Mazumder R. BioXpress: an integrated RNA-seq-derived gene expression database for pan-cancer analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav019. [PMID: 25819073 PMCID: PMC4377087 DOI: 10.1093/database/bav019] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
BioXpress is a gene expression and cancer association database in which the expression levels are mapped to genes using RNA-seq data obtained from The Cancer Genome Atlas, International Cancer Genome Consortium, Expression Atlas and publications. The BioXpress database includes expression data from 64 cancer types, 6361 patients and 17 469 genes with 9513 of the genes displaying differential expression between tumor and normal samples. In addition to data directly retrieved from RNA-seq data repositories, manual biocuration of publications supplements the available cancer association annotations in the database. All cancer types are mapped to Disease Ontology terms to facilitate a uniform pan-cancer analysis. The BioXpress database is easily searched using HUGO Gene Nomenclature Committee gene symbol, UniProtKB/RefSeq accession or, alternatively, can be queried by cancer type with specified significance filters. This interface along with availability of pre-computed downloadable files containing differentially expressed genes in multiple cancers enables straightforward retrieval and display of a broad set of cancer-related genes. Database URL:http://hive.biochemistry.gwu.edu/tools/bioxpress
Collapse
Affiliation(s)
- Quan Wan
- Department of Biochemistry and Molecular Medicine and McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, USA
| | - Hayley Dingerdissen
- Department of Biochemistry and Molecular Medicine and McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, USA
| | - Yu Fan
- Department of Biochemistry and Molecular Medicine and McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, USA
| | - Naila Gulzar
- Department of Biochemistry and Molecular Medicine and McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, USA
| | - Yang Pan
- Department of Biochemistry and Molecular Medicine and McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, USA
| | - Tsung-Jung Wu
- Department of Biochemistry and Molecular Medicine and McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, USA
| | - Cheng Yan
- Department of Biochemistry and Molecular Medicine and McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, USA
| | - Haichen Zhang
- Department of Biochemistry and Molecular Medicine and McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, USA
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine and McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, USA Department of Biochemistry and Molecular Medicine and McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC 20037, USA
| |
Collapse
|
48
|
Pavlopoulou A, Spandidos DA, Michalopoulos I. Human cancer databases (review). Oncol Rep 2014; 33:3-18. [PMID: 25369839 PMCID: PMC4254674 DOI: 10.3892/or.2014.3579] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Accepted: 10/31/2014] [Indexed: 12/20/2022] Open
Abstract
Cancer is one of the four major non‑communicable diseases (NCD), responsible for ~14.6% of all human deaths. Currently, there are >100 different known types of cancer and >500 genes involved in cancer. Ongoing research efforts have been focused on cancer etiology and therapy. As a result, there is an exponential growth of cancer‑associated data from diverse resources, such as scientific publications, genome‑wide association studies, gene expression experiments, gene‑gene or protein‑protein interaction data, enzymatic assays, epigenomics, immunomics and cytogenetics, stored in relevant repositories. These data are complex and heterogeneous, ranging from unprocessed, unstructured data in the form of raw sequences and polymorphisms to well‑annotated, structured data. Consequently, the storage, mining, retrieval and analysis of these data in an efficient and meaningful manner pose a major challenge to biomedical investigators. In the current review, we present the central, publicly accessible databases that contain data pertinent to cancer, the resources available for delivering and analyzing information from these databases, as well as databases dedicated to specific types of cancer. Examples for this wealth of cancer‑related information and bioinformatic tools have also been provided.
Collapse
Affiliation(s)
- Athanasia Pavlopoulou
- Center of Systems Biology, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| | - Demetrios A Spandidos
- Laboratory of Clinical Virology, Medical School, University of Crete, Heraklion 71003, Crete, Greece
| | - Ioannis Michalopoulos
- Center of Systems Biology, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| |
Collapse
|
49
|
Kibbe WA, Arze C, Felix V, Mitraka E, Bolton E, Fu G, Mungall CJ, Binder JX, Malone J, Vasant D, Parkinson H, Schriml LM. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res 2014; 43:D1071-8. [PMID: 25348409 PMCID: PMC4383880 DOI: 10.1093/nar/gku1011] [Citation(s) in RCA: 380] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
The current version of the Human Disease Ontology (DO) (http://www.disease-ontology.org) database expands the utility of the ontology for the examination and comparison of genetic variation, phenotype, protein, drug and epitope data through the lens of human disease. DO is a biomedical resource of standardized common and rare disease concepts with stable identifiers organized by disease etiology. The content of DO has had 192 revisions since 2012, including the addition of 760 terms. Thirty-two percent of all terms now include definitions. DO has expanded the number and diversity of research communities and community members by 50+ during the past two years. These community members actively submit term requests, coordinate biomedical resource disease representation and provide expert curation guidance. Since the DO 2012 NAR paper, there have been hundreds of term requests and a steady increase in the number of DO listserv members, twitter followers and DO website usage. DO is moving to a multi-editor model utilizing Protégé to curate DO in web ontology language. This will enable closer collaboration with the Human Phenotype Ontology, EBI's Ontology Working Group, Mouse Genome Informatics and the Monarch Initiative among others, and enhance DO's current asserted view and multiple inferred views through reasoning.
Collapse
Affiliation(s)
- Warren A Kibbe
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Cesar Arze
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Victor Felix
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Elvira Mitraka
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Evan Bolton
- PubChem, National Center for Biotechnology Information, National Library of Medicine National Institutes of Health Department of Health and Human Services 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Gang Fu
- PubChem, National Center for Biotechnology Information, National Library of Medicine National Institutes of Health Department of Health and Human Services 8600 Rockville Pike, Bethesda, MD 20894, USA
| | | | - Janos X Binder
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, 69117, Germany Bioinformatics Core Facility, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, 4362, Luxembourg
| | - James Malone
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Drashtti Vasant
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lynn M Schriml
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| |
Collapse
|
50
|
Simonyan V, Mazumder R. High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes (Basel) 2014; 5:957-81. [PMID: 25271953 PMCID: PMC4276921 DOI: 10.3390/genes5040957] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Revised: 09/22/2014] [Accepted: 09/22/2014] [Indexed: 12/30/2022] Open
Abstract
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.
Collapse
Affiliation(s)
- Vahan Simonyan
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA.
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA.
| |
Collapse
|