1
|
Bahrami B, Wolfien M, Nikpour P. Integrated analysis of transcriptome and epigenome reveals ENSR00000272060 as a potential biomarker in gastric cancer. Epigenomics 2024; 16:159-173. [PMID: 38282575 DOI: 10.2217/epi-2023-0213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2024] Open
Abstract
Background: Enhancer RNAs (eRNAs) are involved in gene expression regulation. Although functional roles of eRNAs in the pathophysiology of neoplasms have been reported, their involvement in gastric cancer (GC) is less known. Materials & methods: A network-based integrative approach was utilized for analyzing transcriptome and epigenome alterations in GC, and an eRNA was selected for experimental validation. Survival analysis and clinicopathological associations were also performed. Results: A hub eRNA, ENSR00000272060, showed significantly increased expression in tumor versus nontumor tissues, as well as an association with clinicopathological features. A seven-gene prognostic model was also constructed. Conclusion: The constructed network provides a comprehensive understanding of the underlying processes implicated in the progression of GC, along with a starting point from which to derive potential diagnostic/prognostic biomarkers.
Collapse
Affiliation(s)
- Basireh Bahrami
- Department of Genetics & Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, 8174673461, Isfahan, Iran
| | - Markus Wolfien
- Institute for Medical Informatics & Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, 01307, Germany
| | - Parvaneh Nikpour
- Department of Genetics & Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, 8174673461, Isfahan, Iran
| |
Collapse
|
2
|
Hadar N, Weintraub G, Gudes E, Dolev S, Birk OS. GeniePool: genomic database with corresponding annotated samples based on a cloud data lake architecture. Database (Oxford) 2023; 2023:baad043. [PMID: 37311148 DOI: 10.1093/database/baad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Revised: 05/17/2023] [Accepted: 05/26/2023] [Indexed: 06/15/2023]
Abstract
In recent years, there are a huge influx of genomic data and a growing need for its phenotypic correlations, yet existing genomic databases do not allow easy storage and accessibility to the combined phenotypic-genotypic information. Freely accessible allele frequency (AF) databases, such as gnomAD, are crucial for evaluating variants but lack correlated phenotype data. The Sequence Read Archive (SRA) accumulates hundreds of thousands of next-generation sequencing (NGS) samples tagged by their submitters and various attributes. However, samples are stored in large raw format files, inaccessible for a common user. To make thousands of NGS samples and their corresponding additional attributes easily available to clinicians and researchers, we generated a pipeline that continuously downloads raw human NGS data uploaded to SRA using SRAtoolkit and preprocesses them using GATK pipeline. Data are then stored efficiently in a cloud data lake and can be accessed via a representational state transfer application programming interface (REST API) and a user-friendly website. We thus generated GeniePool, a simple and intuitive web service and API for querying NGS data from SRA with direct access to information related to each sample and related studies, providing significant advantages over existing databases for both clinical and research usages. Utilizing data lake infrastructure, we were able to generate a multi-purpose tool that can serve many clinical and research use cases. We expect users to explore the meta-data served via GeniePool both in daily clinical practice and in versatile research endeavours. Database URL https://geniepool.link.
Collapse
Affiliation(s)
- Noam Hadar
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
- Genetics Institute, Soroka Medical Center, Beer Sheva 84101, Israel
| | - Grisha Weintraub
- Department of Computer Science, Faculty of Natural Sciences, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Ehud Gudes
- Department of Computer Science, Faculty of Natural Sciences, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Shlomi Dolev
- Department of Computer Science, Faculty of Natural Sciences, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Ohad S Birk
- The Morris Kahn Laboratory of Human Genetics at the National Institute of Biotechnology in the Negev and Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
- Genetics Institute, Soroka Medical Center, Beer Sheva 84101, Israel
| |
Collapse
|
3
|
Khalafiyan A, Emadi-Baygi M, Wolfien M, Salehzadeh-Yazdi A, Nikpour P. Construction of a three-component regulatory network of transcribed ultraconserved regions for the identification of prognostic biomarkers in gastric cancer. J Cell Biochem 2023; 124:396-408. [PMID: 36748954 DOI: 10.1002/jcb.30373] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 01/04/2023] [Accepted: 01/09/2023] [Indexed: 02/08/2023]
Abstract
Altered expression and functional roles of the transcribed ultraconserved regions (T-UCRs), as genomic sequences with 100% conservation between the genomes of human, mouse, and rat, in the pathophysiology of neoplasms has already been investigated. Nevertheless, the relevance of the functions for T-UCRs in gastric cancer (GC) is still the subject of inquiry. In the current study, we first used a genome-wide profiling approach to analyze the expression of T-UCRs in GC patients. Then, we constructed a three-component regulatory network and investigated potential diagnostic and prognostic values of the T-UCRs. The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) dataset was used as a resource for the RNA-sequencing data. FeatureCounts was utilized to quantify the number of reads mapped to each T-UCR. Differential expression analysis was then conducted using DESeq2. In the following, interactions between T-UCRs, microRNAs (miRNAs), and messenger RNAs (mRNAs) were combined into a three-component network. Enrichment analyses were performed and a protein-protein interaction (PPI) network was constructed. The R Survival package was utilized to identify survival-related significantly differentially expressed T-UCRs (DET-UCRs). Using an in-house cohort of GC tissues, expression of two DET-UCRs was furthermore experimentally verified. Our results showed that several T-UCRs were dysregulated in TCGA-STAD tumoral samples compared to nontumoral counterparts. The three-component network was constructed which composed of DET-UCRs, miRNAs, and mRNAs nodes. Functional enrichment and PPI network analyses revealed important enriched signaling pathways and gene ontologies such as "pathway in cancer" and regulation of cell proliferation and apoptosis. Five T-UCRs were significantly correlated with the overall survival of GC patients. While no expression of uc.232 was observed in our in-house cohort of GC tissues, uc.343 showed an increased expression, although not statistically significant, in gastric tumoral tissues. The constructed three-component regulatory network of T-UCRs in GC presents a comprehensive understanding of the underlying gene expression regulation processes involved in tumor development and can serve as a basis to investigate potential prognostic biomarkers and therapeutic targets.
Collapse
Affiliation(s)
- Anis Khalafiyan
- Department of Genetics and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Modjtaba Emadi-Baygi
- Department of Genetics, Faculty of Basic Sciences, Shahrekord University, Shahrekord, Iran
| | - Markus Wolfien
- Department of System Biology and Bioinformatics, University of Rostock, Rostock, Germany
- Center for Medical Informatics, Dresden, Germany
| | - Ali Salehzadeh-Yazdi
- Department of Life Sciences and Chemistry, Jacobs University Bremen, Bremen, Germany
| | - Parvaneh Nikpour
- Department of Genetics and Molecular Biology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|
4
|
Consent Codes: Maintaining Consent in an Ever-expanding Open Science Ecosystem. Neuroinformatics 2023; 21:89-100. [PMID: 36520344 PMCID: PMC9931855 DOI: 10.1007/s12021-022-09577-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/23/2022] [Indexed: 12/23/2022]
Abstract
We previously proposed a structure for recording consent-based data use 'categories' and 'requirements' - Consent Codes - with a view to supporting maximum use and integration of genomic research datasets, and reducing uncertainty about permissible re-use of shared data. Here we discuss clarifications and subsequent updates to the Consent Codes (v4) based on new areas of application (e.g., the neurosciences, biobanking, H3Africa), policy developments (e.g., return of research results), and further practical considerations, including developments in automated approaches to consent management.
Collapse
|
5
|
Guo X, Han J, Song Y, Yin Z, Liu S, Shang X. Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions. Front Genet 2022; 13:921775. [PMID: 36046233 PMCID: PMC9421127 DOI: 10.3389/fgene.2022.921775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 07/04/2022] [Indexed: 11/13/2022] Open
Abstract
Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype–phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics’ internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes.Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype–phenotype association analysis in deep learning networks.
Collapse
Affiliation(s)
- Xinpeng Guo
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, China
- School of Air and Missile Defense, Air Force Engineering University, Xi’an, China
| | - Jinyu Han
- School of Economics and Management, Chang ‘an University, Xi’an, China
| | - Yafei Song
- School of Air and Missile Defense, Air Force Engineering University, Xi’an, China
| | - Zhilei Yin
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, China
| | - Shuaichen Liu
- School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, China
| | - Xuequn Shang
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, China
- *Correspondence: Xuequn Shang,
| |
Collapse
|
6
|
Clark KC, Kwitek AE. Multi-Omic Approaches to Identify Genetic Factors in Metabolic Syndrome. Compr Physiol 2021; 12:3045-3084. [PMID: 34964118 PMCID: PMC9373910 DOI: 10.1002/cphy.c210010] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Metabolic syndrome (MetS) is a highly heritable disease and a major public health burden worldwide. MetS diagnosis criteria are met by the simultaneous presence of any three of the following: high triglycerides, low HDL/high LDL cholesterol, insulin resistance, hypertension, and central obesity. These diseases act synergistically in people suffering from MetS and dramatically increase risk of morbidity and mortality due to stroke and cardiovascular disease, as well as certain cancers. Each of these component features is itself a complex disease, as is MetS. As a genetically complex disease, genetic risk factors for MetS are numerous, but not very powerful individually, often requiring specific environmental stressors for the disease to manifest. When taken together, all sequence variants that contribute to MetS disease risk explain only a fraction of the heritable variance, suggesting additional, novel loci have yet to be discovered. In this article, we will give a brief overview on the genetic concepts needed to interpret genome-wide association studies (GWAS) and quantitative trait locus (QTL) data, summarize the state of the field of MetS physiological genomics, and to introduce tools and resources that can be used by the physiologist to integrate genomics into their own research on MetS and any of its component features. There is a wealth of phenotypic and molecular data in animal models and humans that can be leveraged as outlined in this article. Integrating these multi-omic QTL data for complex diseases such as MetS provides a means to unravel the pathways and mechanisms leading to complex disease and promise for novel treatments. © 2022 American Physiological Society. Compr Physiol 12:1-40, 2022.
Collapse
Affiliation(s)
- Karen C Clark
- Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Anne E Kwitek
- Department of Physiology, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
7
|
Abstract
A huge array of data in nephrology is collected through patient registries, large epidemiological studies, electronic health records, administrative claims, clinical trial repositories, mobile health devices and molecular databases. Application of these big data, particularly using machine-learning algorithms, provides a unique opportunity to obtain novel insights into kidney diseases, facilitate personalized medicine and improve patient care. Efforts to make large volumes of data freely accessible to the scientific community, increased awareness of the importance of data sharing and the availability of advanced computing algorithms will facilitate the use of big data in nephrology. However, challenges exist in accessing, harmonizing and integrating datasets in different formats from disparate sources, improving data quality and ensuring that data are secure and the rights and privacy of patients and research participants are protected. In addition, the optimism for data-driven breakthroughs in medicine is tempered by scepticism about the accuracy of calibration and prediction from in silico techniques. Machine-learning algorithms designed to study kidney health and diseases must be able to handle the nuances of this specialty, must adapt as medical practice continually evolves, and must have global and prospective applicability for external and future datasets.
Collapse
|
8
|
Boegel S, Castle JC, Schwarting A. Current status of use of high throughput nucleotide sequencing in rheumatology. RMD Open 2021; 7:rmdopen-2020-001324. [PMID: 33408124 PMCID: PMC7789458 DOI: 10.1136/rmdopen-2020-001324] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 09/15/2020] [Accepted: 11/24/2020] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE Here, we assess the usage of high throughput sequencing (HTS) in rheumatic research and the availability of public HTS data of rheumatic samples. METHODS We performed a semiautomated literature review on PubMed, consisting of an R-script and manual curation as well as a manual search on the Sequence Read Archive for public available HTS data. RESULTS Of the 699 identified articles, rheumatoid arthritis (n=182 publications, 26%), systemic lupus erythematous (n=161, 23%) and osteoarthritis (n=152, 22%) are among the rheumatic diseases with the most reported use of HTS assays. The most represented assay is RNA-Seq (n=457, 65%) for the identification of biomarkers in blood or synovial tissue. We also find, that the quality of accompanying clinical characterisation of the sequenced patients differs dramatically and we propose a minimal set of clinical data necessary to accompany rheumatological-relevant HTS data. CONCLUSION HTS allows the analysis of a broad spectrum of molecular features in many samples at the same time. It offers enormous potential in novel personalised diagnosis and treatment strategies for patients with rheumatic diseases. Being established in cancer research and in the field of Mendelian diseases, rheumatic diseases are about to become the third disease domain for HTS, especially the RNA-Seq assay. However, we need to start a discussion about reporting of clinical characterisation accompany rheumatological-relevant HTS data to make clinical meaningful use of this data.
Collapse
Affiliation(s)
- Sebastian Boegel
- Department of Internal Medicine, University Center of Autoimmunity, University Medical Center Mainz, Mainz, Germany
| | | | - Andreas Schwarting
- Department of Internal Medicine, University Center of Autoimmunity, University Medical Center Mainz, Mainz, Germany.,Division of Rheumatology and Clinical Immunology, University Hospital Mainz, Mainz, Germany.,Acura Rheumatology Center Rhineland Palatinate, Bad Kreuznach, Germany
| |
Collapse
|
9
|
Acosta JN, Szejko N, Falcone GJ. Mendelian Randomization in Stroke: A Powerful Approach to Causal Inference and Drug Target Validation. Front Genet 2021; 12:683082. [PMID: 34456968 PMCID: PMC8387928 DOI: 10.3389/fgene.2021.683082] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 06/28/2021] [Indexed: 02/06/2023] Open
Abstract
Stroke is a leading cause of death and disability worldwide. However, our understanding of its underlying biology and the number of available treatment options remain limited. Mendelian randomization (MR) offers a powerful approach to identify novel biological pathways and therapeutic targets for this disease. Around ~100 MR studies have been conducted so far to explore, confirm, and quantify causal relationships between several exposures and risk of stroke. In this review, we summarize the current evidence arising from these studies, including those investigating ischemic stroke, hemorrhagic stroke, or both. We highlight the different types of exposures that are currently under study, ranging from well-known cardiovascular risk factors to less established inflammation-related mechanisms. Finally, we provide an overview of future avenues of research and novel approaches, including drug target validation MR, which is poised to have a substantial impact on drug development and drug repurposing.
Collapse
Affiliation(s)
- Julián N. Acosta
- Division of Neurocritical Care and Emergency Neurology, Department of Neurology, Yale School of Medicine, New Haven, CT, United States
| | - Natalia Szejko
- Division of Neurocritical Care and Emergency Neurology, Department of Neurology, Yale School of Medicine, New Haven, CT, United States
- Department of Neurology, Medical University of Warsaw, Warsaw, Poland
- Department of Bioethics, Medical University of Warsaw, Warsaw, Poland
| | - Guido J. Falcone
- Division of Neurocritical Care and Emergency Neurology, Department of Neurology, Yale School of Medicine, New Haven, CT, United States
| |
Collapse
|
10
|
Mulder N, Zass L, Hamdi Y, Othman H, Panji S, Allali I, Fakim YJ. African Global Representation in Biomedical Sciences. Annu Rev Biomed Data Sci 2021; 4:57-81. [PMID: 34465182 DOI: 10.1146/annurev-biodatasci-102920-112550] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
African populations are diverse in their ethnicity, language, culture, and genetics. Although plagued by high disease burdens, until recently the continent has largely been excluded from biomedical studies. Along with limitations in research and clinical infrastructure, human capacity, and funding, this omission has resulted in an underrepresentation of African data and disadvantaged African scientists. This review interrogates the relative abundance of biomedical data from Africa, primarily in genomics and other omics. The visibility of African science through publications is also discussed. A challenge encountered in this review is the relative lack of annotation of data on their geographical or population origin, with African countries represented as a single group. In addition to the abovementioned limitations,the global representation of African data may also be attributed to the hesitation to deposit data in public repositories. Whatever the reason, the disparity should be addressed, as African data have enormous value for scientists in Africa and globally.
Collapse
Affiliation(s)
- Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa; .,Wellcome Centre for Infectious Diseases Research in Africa (CIDRI-AFRICA), Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Lyndon Zass
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa;
| | - Yosr Hamdi
- Laboratory of Biomedical Genomics and Oncogenetics and Laboratory of Human and Experimental Pathology, Institut Pasteur de Tunis, University of Tunis El Manar, 1002 Tunis, Tunisia
| | - Houcemeddine Othman
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg 2193, South Africa
| | - Sumir Panji
- Computational Biology Division, Department of Integrative Biomedical Sciences and Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa;
| | - Imane Allali
- Laboratory of Human Pathologies Biology, Department of Biology, Faculty of Sciences, and Genomic Center of Human Pathologies, Faculty of Medicine and Pharmacy, Mohammed V University in Rabat, 1014 Rabat, Morocco
| | - Yasmina Jaufeerally Fakim
- Biotechnology Unit, Department of Agricultural and Food Science, Faculty of Agriculture, University of Mauritius, Réduit 80837, Mauritius
| |
Collapse
|
11
|
Almowil ZA, Zhou SM, Brophy S. Concept libraries for automatic electronic health record based phenotyping: A review. Int J Popul Data Sci 2021; 6:1362. [PMID: 34189274 PMCID: PMC8210840 DOI: 10.23889/ijpds.v5i1.1362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Introduction Electronic health records (EHR) are linked together to examine disease history and to undertake research into the causes and outcomes of disease. However, the process of constructing algorithms for phenotyping (e.g., identifying disease characteristics) or health characteristics (e.g., smoker) is very time consuming and resource costly. In addition, results can vary greatly between researchers. Reusing or building on algorithms that others have created is a compelling solution to these problems. However, sharing algorithms is not a common practice and many published studies do not detail the clinical code lists used by the researchers in the disease/characteristic definition. To address these challenges, a number of centres across the world have developed health data portals which contain concept libraries (e.g., algorithms for defining concepts such as disease and characteristics) in order to facilitate disease phenotyping and health studies. Objectives This study aims to review the literature of existing concept libraries, examine their utilities, identify the current gaps, and suggest future developments. Methods The five-stage framework of Arksey and O'Malley was used for the literature search. This approach included defining the research questions, identifying relevant studies through literature review, selecting eligible studies, charting and extracting data, and summarising and reporting the findings. Results This review identified seven publicly accessible Electronic Health data concept libraries which were developed in different countries including UK, USA, and Canada. The concept libraries (n = 7) investigated were either general libraries that hold phenotypes of multiple specialties (n = 4) or specialized libraries that manage only certain specialities such as rare diseases (n = 3). There were some clear differences between the general libraries such as archiving data from different electronic sources, and using a range of different types of coding systems. However, they share some clear similarities such as enabling users to upload their own code lists, and allowing users to use/download the publicly accessible code. In addition, there were some differences between the specialized libraries such as difference in ability to search, and if it was possible to use different searching queries such as simple or complex searches. Conversely, there were some similarities between the specialized libraries such as enabling users to upload their own concepts into the libraries and to show where they were published, which facilitates assessing the validity of the concepts. All the specialized libraries aimed to encourage the reuse of research methods such as lists of clinical code and/or metadata. Conclusion The seven libraries identified have been developed independently and appear to replicate similar concepts but in different ways. Collaboration between similar libraries would greatly facilitate the use of these libraries for the user. The process of building code lists takes time and effort. Access to existing code lists increases consistency and accuracy of definitions across studies. Concept library developers should collaborate with each other to raise awareness of their existence and of their various functions, which could increase users’ contributions to those libraries and promote their wide-ranging adoption.
Collapse
Affiliation(s)
| | - Shang-Ming Zhou
- Centre for Health Technology, Faculty of Health, University of Plymouth, Plymouth, PL4 8AA, UK
| | | |
Collapse
|
12
|
Hu RH, Chuang CY, Lin CW, Su SC, Chang LC, Wu SW, Liu YF, Yang SF. Effect of MACC1 Genetic Polymorphisms and Environmental Risk Factors in the Occurrence of Oral Squamous Cell Carcinoma. J Pers Med 2021; 11:jpm11060490. [PMID: 34072650 PMCID: PMC8228283 DOI: 10.3390/jpm11060490] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 05/21/2021] [Accepted: 05/27/2021] [Indexed: 12/30/2022] Open
Abstract
MACC1 (Metastasis Associated in Colon Cancer 1) is found to regulate the hepatocyte growth factor (HGF)/Met signal pathway, and plays an important role in tumor proliferation, angiogenesis, and metastasis. However, the relationships between MACC1 SNPs (single nucleotide polymorphisms) and oral cancer are still blurred. In this study, five SNPs (rs3095007, rs1990172, rs4721888, rs975263, and rs3735615) were genotyped in 911 oral cancer patients and 1200 healthy individuals by real-time polymerase chain reaction (PCR), and the associations of oral cancer with the SNP genotypes, environmental risk factors, and clinicopathological characteristics were further analyzed. Our results showed that individuals who had GC genotype or C-allele (GC + CC) in rs4721888 would have a higher risk for oral cancer incidence than GG genotype after adjustment for betel quid chewing, cigarette smoking, and alcohol drinking. Moreover, the 715 oral cancer patients with a betel quid chewing habit, who had C-allele (TC + CC) in rs975263, would have a higher risk for lymph node metastasis. Further analyses of the sequences of rs4721888 revealed that the C-allele of rs4721888 would be a putative exonic splicing enhancer. In conclusion, MACC1 SNP rs4721888 would elevate the susceptibility for oral cancer, and SNP rs975263 would increase the metastasis risk for oral cancer patients with a betel quid chewing habit. Our data suggest that SNP rs4721888 could be a putative genetic marker for oral cancer, and SNP rs975362 may have the potential to be a prognostic marker of metastasis in an oral cancer patient.
Collapse
Affiliation(s)
- Rei-Hsing Hu
- Department of Biomedical Sciences, Chung Shan Medical University, Taichung 402, Taiwan;
| | - Chun-Yi Chuang
- School of Medicine, Chung Shan Medical University, Taichung 402, Taiwan;
- Department of Otolaryngology, Chung Shan Medical University Hospital, Taichung 402, Taiwan
| | - Chiao-Wen Lin
- Institute of Oral Sciences, Chung Shan Medical University, Taichung 402, Taiwan;
- Department of Dentistry, Chung Shan Medical University Hospital, Taichung 402, Taiwan
| | - Shih-Chi Su
- Whole-Genome Research Core Laboratory of Human Diseases, Chang Gung Memorial Hospital, Keelung 204, Taiwan;
- Department of Dermatology, Drug Hypersensitivity Clinical and Research Center, Chang Gung Memorial Hospital, Linkou 333, Taiwan
| | - Lun-Ching Chang
- Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, FL 33431, USA;
| | - Ssu-Wei Wu
- Institute of Medicine, Chung Shan Medical University, Taichung 402, Taiwan;
| | - Yu-Fan Liu
- Department of Biomedical Sciences, Chung Shan Medical University, Taichung 402, Taiwan;
- Department of Pediatrics, Chung Shan Medical University Hospital, Taichung 402, Taiwan
- Correspondence: (Y.-F.L.); (S.-F.Y.)
| | - Shun-Fa Yang
- Institute of Medicine, Chung Shan Medical University, Taichung 402, Taiwan;
- Department of Medical Research, Chung Shan Medical University Hospital, Taichung 402, Taiwan
- Correspondence: (Y.-F.L.); (S.-F.Y.)
| |
Collapse
|
13
|
Wilcock D, Jicha G, Blacker D, Albert MS, D’Orazio LM, Elahi FM, Fornage M, Hinman JD, Knoefel J, Kramer J, Kryscio RJ, Lamar M, Moghekar A, Prestopnik J, Ringman JM, Rosenberg G, Sagare A, Satizabal CL, Schneider J, Seshadri S, Sur S, Tracy RP, Yasar S, Williams V, Singh H, Mazina L, Helmer KG, Corriveau RA, Schwab K, Kivisäkk P, Greenberg SM. MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols. Alzheimers Dement 2021; 17:704-715. [PMID: 33480172 PMCID: PMC8122220 DOI: 10.1002/alz.12215] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 09/22/2020] [Indexed: 01/04/2023]
Abstract
The concept of vascular contributions to cognitive impairment and dementia (VCID) derives from more than two decades of research indicating that (1) most older individuals with cognitive impairment have post mortem evidence of multiple contributing pathologies and (2) along with the preeminent role of Alzheimer's disease (AD) pathology, cerebrovascular disease accounts for a substantial proportion of this contribution. Contributing cerebrovascular processes include both overt strokes caused by etiologies such as large vessel occlusion, cardioembolism, and embolic infarcts of unknown source, and frequently asymptomatic brain injuries caused by diseases of the small cerebral vessels. Cerebral small vessel diseases such as arteriolosclerosis and cerebral amyloid angiopathy, when present at moderate or greater pathologic severity, are independently associated with worse cognitive performance and greater likelihood of dementia, particularly in combination with AD and other neurodegenerative pathologies. Based on this evidence, the US National Alzheimer's Project Act explicitly authorized accelerated research in vascular and mixed dementia along with frontotemporal and Lewy body dementia and AD itself. Biomarker development has been consistently identified as a key step toward translating scientific advances in VCID into effective prevention and treatment strategies. Validated biomarkers can serve a range of purposes in trials of candidate interventions, including (1) identifying individuals at increased VCID risk, (2) diagnosing the presence of cerebral small vessel disease or specific small vessel pathologies, (3) stratifying study participants according to their prognosis for VCID progression or treatment response, (4) demonstrating an intervention's target engagement or pharmacodynamic mechanism of action, and (5) monitoring disease progression during treatment. Effective biomarkers allow academic and industry investigators to advance promising interventions at early stages of development and discard interventions with low success likelihood. The MarkVCID consortium was formed in 2016 with the goal of developing and validating fluid- and imaging-based biomarkers for the cerebral small vessel diseases associated with VCID. MarkVCID consists of seven project sites and a central coordinating center, working with the National Institute of Neurologic Diseases and Stroke and National Institute on Aging under cooperative agreements. Through an internal selection process, MarkVCID has identified a panel of 11 candidate biomarker "kits" (consisting of the biomarker measure and the clinical and cognitive data used to validate it) and established a range of harmonized procedures and protocols for participant enrollment, clinical and cognitive evaluation, collection and handling of fluid samples, acquisition of neuroimaging studies, and biomarker validation. The overarching goal of these protocols is to generate rigorous validating data that could be used by investigators throughout the research community in selecting and applying biomarkers to multi-site VCID trials. Key features of MarkVCID participant enrollment, clinical/cognitive testing, and fluid biomarker procedures are summarized here, with full details in the following text, tables, and supplemental material, and a description of the MarkVCID imaging biomarker procedures in a companion paper, "MarkVCID Cerebral small vessel consortium: II. Neuroimaging protocols." The procedures described here address a range of challenges in MarkVCID's design, notably: (1) acquiring all data under informed consent and enrollment procedures that allow unlimited sharing and open-ended analyses without compromising participant privacy rights; (2) acquiring the data in a sufficiently wide range of study participants to allow assessment of candidate biomarkers across the various patient groups who might ultimately be targeted in VCID clinical trials; (3) defining a common dataset of clinical and cognitive elements that contains all the key outcome markers and covariates for VCID studies and is realistically obtainable during a practical study visit; (4) instituting best fluid-handling practices for minimizing avoidable sources of variability; and (5) establishing rigorous procedures for testing the reliability of candidate fluid-based biomarkers across replicates, assay runs, sites, and time intervals (collectively defined as the biomarker's instrumental validity). Participant Enrollment Project sites enroll diverse study cohorts using site-specific inclusion and exclusion criteria so as to provide generalizable validation data across a range of cognitive statuses, risk factor profiles, small vessel disease severities, and racial/ethnic characteristics representative of the diverse patient groups that might be enrolled in a future VCID trial. MarkVCID project sites include both prospectively enrolling centers and centers providing extant data and samples from preexisting community- and population-based studies. With approval of local institutional review boards, all sites incorporate MarkVCID consensus language into their study documents and informed consent agreements. The consensus language asks prospectively enrolled participants to consent to unrestricted access to their data and samples for research analysis within and outside MarkVCID. The data are transferred and stored as a de-identified dataset as defined by the Health Insurance Portability and Accountability Act Privacy Rule. Similar human subject protection and informed consent language serve as the basis for MarkVCID Research Agreements that act as contracts and data/biospecimen sharing agreements across the consortium. Clinical and Cognitive Data Clinical and cognitive data are collected across prospectively enrolling project sites using common MarkVCID instruments. The clinical data elements are modified from study protocols already in use such as the Alzheimer's Disease Center program Uniform Data Set Version 3 (UDS3), with additional focus on VCID-related items such as prior stroke and cardiovascular disease, vascular risk factors, focal neurologic findings, and blood testing for vascular risk markers and kidney function including hemoglobin A1c, cholesterol subtypes, triglycerides, and creatinine. Cognitive assessments and rating instruments include the Clinical Dementia Rating Scale, Geriatric Depression Scale, and most of the UDS3 neuropsychological battery. The cognitive testing requires ≈60 to 90 minutes. Study staff at the prospectively recruiting sites undergo formalized training in all measures and review of their first three UDS3 administrations by the coordinating center. Collection and Handling of Fluid Samples Fluid sample types collected for MarkVCID biomarker kits are serum, ethylenediaminetetraacetic acid-plasma, platelet-poor plasma, and cerebrospinal fluid (CSF) with additional collection of packed cells to allow future DNA extraction and analyses. MarkVCID fluid guidelines to minimize variability include fasting morning fluid collections, rapid processing, standardized handling and storage, and avoidance of CSF contact with polystyrene. Instrumental Validation for Fluid-Based Biomarkers Instrumental validation of MarkVCID fluid-based biomarkers is operationally defined as determination of intra-plate and inter-plate repeatability, inter-site reproducibility, and test-retest repeatability. MarkVCID study participants both with and without advanced small vessel disease are selected for these determinations to assess instrumental validity across the full biomarker assay range. Intra- and inter-plate repeatability is determined by repeat assays of single split fluid samples performed at individual sites. Inter-site reproducibility is determined by assays of split samples distributed to multiple sites. Test-retest repeatability is determined by assay of three samples acquired from the same individual, collected at least 5 days apart over a 30-day period and assayed on a single plate. The MarkVCID protocols are designed to allow direct translation of the biomarker validation results to multicenter trials. They also provide a template for outside groups to perform analyses using identical methods and therefore allow direct comparison of results across studies and centers. All MarkVCID protocols are available to the biomedical community and intended to be shared. In addition to the instrumental validation procedures described here, each of the MarkVCID kits will undergo biological validation to determine whether the candidate biomarker measures important aspects of VCID such as cognitive function. Analytic methods and results of these validation studies for the 11 MarkVCID biomarker kits will be published separately. The results of this rigorous validation process will ultimately determine each kit's potential usefulness for multicenter interventional trials aimed at preventing or treating small vessel disease related VCID.
Collapse
Affiliation(s)
- Donna Wilcock
- Sanders-Brown Center on Aging, University of Kentucky College of Medicine, Lexington, KY 40504, USA
| | - Gregory Jicha
- Sanders-Brown Center on Aging, University of Kentucky College of Medicine, Lexington, KY 40504, USA
| | - Deborah Blacker
- Department of Epidemiology, Harvard T.H Chan School of Public Health and Department of Psychiatry, Harvard Medical School, Boston, MA 02115, USA
| | - Marilyn S. Albert
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Lina M. D’Orazio
- Department of Neurology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - Fanny M. Elahi
- Center for Memory and Aging, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA 94143, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School and Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Jason D. Hinman
- David Geffen School of Medicine, Department of Neurology, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Janice Knoefel
- Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM, USA
| | - Joel Kramer
- David Geffen School of Medicine, Department of Neurology, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Richard J. Kryscio
- Sanders-Brown Center on Aging, University of Kentucky College of Medicine, Lexington, KY 40504, USA
| | - Melissa Lamar
- Rush Alzheimer’s Disease Center, Rush University, Chicago, IL, USA
| | - Abhay Moghekar
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Jillian Prestopnik
- Center for Memory and Aging, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - John M. Ringman
- Department of Neurology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - Gary Rosenberg
- Center for Memory and Aging, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Abhay Sagare
- Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Claudia L. Satizabal
- Glenn Biggs Institute for Alzheimer’s & Neurodegenerative Diseases, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Julie Schneider
- Rush Alzheimer’s Disease Center, Rush University, Chicago, IL, USA
| | - Sudha Seshadri
- Glenn Biggs Institute for Alzheimer’s & Neurodegenerative Diseases, University of Texas Health San Antonio, San Antonio, TX 78229, USA
| | - Sandeepa Sur
- Department of Radiology and Radiological Science, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Russell P. Tracy
- Department of Pathology and Laboratory Medicine, University of Vermont Larner College of Medicine, Burlington, VT 05405, USA
| | - Sevil Yasar
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Victoria Williams
- Department of Medicine, University of Wisconsin School of Medicine and Public Health, Madison, WI 53705, USA
| | - Herpreet Singh
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lidiya Mazina
- Neurological Clinical Research Institute, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Karl G. Helmer
- Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, USA
| | | | - Kristin Schwab
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Pia Kivisäkk
- Alzheimer’s Clinical and Translational Research Unit, Massachusetts General Hospital, Boston, MA 02129, USA
| | - Steven M. Greenberg
- Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA
| | | |
Collapse
|
14
|
Ma L, Lou S, Miao Z, Yao S, Yu X, Kan S, Zhu G, Yang F, Zhang C, Zhang W, Wang M, Wang L, Pan Y. Identification of novel susceptibility loci for non-syndromic cleft lip with or without cleft palate. J Cell Mol Med 2020; 24:13669-13678. [PMID: 33108691 PMCID: PMC7754035 DOI: 10.1111/jcmm.15878] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Revised: 07/29/2020] [Accepted: 08/17/2020] [Indexed: 12/25/2022] Open
Abstract
Although several genome‐wide association studies (GWAS) of non‐syndromic cleft lip with or without cleft palate (NSCL/P) have been reported, more novel association signals are remained to be exploited. Here, we performed an in‐depth analysis of our previously published Chinese GWAS cohort study with replication in an extra dbGaP case‐parent trios and another in‐house Nanjing cohort, and finally identified five novel significant association signals (rs11119445: 3’ of SERTAD4, P = 6.44 × 10−14; rs227227 and rs12561877: intron of SYT14, P = 5.02 × 10−13 and 2.80 × 10−11, respectively; rs643118: intron of TRAF3IP3, P = 4.45 × 10−6; rs2095293: intron of NR6A1, P = 2.98 × 10−5). The mean (standard deviation) of the weighted genetic risk score (wGRS) from these SNPs was 1.83 (0.65) for NSCL/P cases and 1.58 (0.68) for controls, respectively (P = 2.67 × 10−16). Rs643118 was identified as a shared susceptible factor of NSCL/P among Asians and Europeans, while rs227227 may contribute to the risk of NSCL/P as well as NSCPO. In addition, sertad4 knockdown zebrafish models resulted in down‐regulation of sox2 and caused oedema around the heart and mandibular deficiency, compared with control embryos. Taken together, this study has improved our understanding of the genetic susceptibility to NSCL/P and provided further clues to its aetiology in the Chinese population.
Collapse
Affiliation(s)
- Lan Ma
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China.,Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China.,Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Shu Lou
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China
| | - Ziyue Miao
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China
| | - Siyue Yao
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China
| | - Xin Yu
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China
| | - Shiyi Kan
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China
| | - Guirong Zhu
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China
| | - Fan Yang
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China
| | - Chi Zhang
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China
| | - Weibing Zhang
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China.,Department of Orthodontics, Affiliated Hospital of Stomatology, Nanjing Medical University, Nanjing, China
| | - Meilin Wang
- Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, China.,Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China
| | - Lin Wang
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China.,Department of Orthodontics, Affiliated Hospital of Stomatology, Nanjing Medical University, Nanjing, China.,State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| | - Yongchu Pan
- Jiangsu Key Laboratory of Oral Diseases, Nanjing Medical University, Nanjing, China.,Department of Orthodontics, Affiliated Hospital of Stomatology, Nanjing Medical University, Nanjing, China.,State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, China
| |
Collapse
|
15
|
Llamas-Velasco M, Reolid A, Sanz-García A, Alonso-Guirado L, García-Martínez J, Sánchez-Jiménez P, Muñoz-Aceituno E, Daudén E, Abad-Santos F, Ovejero-Benito MC. Methylation in psoriasis. Does sex matter? J Eur Acad Dermatol Venereol 2020; 35:e161-e163. [PMID: 32805747 DOI: 10.1111/jdv.16888] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 08/05/2020] [Accepted: 08/11/2020] [Indexed: 01/06/2023]
Affiliation(s)
- M Llamas-Velasco
- Dermatology Department, Hospital Universitario de la Princesa, Instituto de Investigación Sanitaria La Princesa (IIS-IP), Madrid, Spain
| | - A Reolid
- Dermatology Department, Hospital Universitario de la Princesa, Instituto de Investigación Sanitaria La Princesa (IIS-IP), Madrid, Spain
| | - A Sanz-García
- Data Analysis Unit, Hospital Universitario de la Princesa, Instituto de Investigación Sanitaria La Princesa (IIS-IP), Madrid, Spain
| | - L Alonso-Guirado
- Genetic & Molecular Epidemiology Group, Spanish National Cancer Research Center (CNIO), Madrid, Spain
| | - J García-Martínez
- Data Analysis Unit, Hospital Universitario de la Princesa, Instituto de Investigación Sanitaria La Princesa (IIS-IP), Madrid, Spain
| | - P Sánchez-Jiménez
- Clinical Pharmacology Department, Hospital Universitario de la Princesa, Instituto Teófilo Hernando, Instituto de Investigación Sanitaria la Princesa (IIS-IP), Universidad Autónoma de Madrid (UAM), Madrid, Spain.,NIMGenetics Genómica y Medicina S.L., Madrid, Spain
| | - E Muñoz-Aceituno
- Dermatology Department, Hospital Universitario de la Princesa, Instituto de Investigación Sanitaria La Princesa (IIS-IP), Madrid, Spain
| | - E Daudén
- Dermatology Department, Hospital Universitario de la Princesa, Instituto de Investigación Sanitaria La Princesa (IIS-IP), Madrid, Spain
| | - F Abad-Santos
- Clinical Pharmacology Department, Hospital Universitario de la Princesa, Instituto Teófilo Hernando, Instituto de Investigación Sanitaria la Princesa (IIS-IP), Universidad Autónoma de Madrid (UAM), Madrid, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Madrid, Spain
| | - M C Ovejero-Benito
- Clinical Pharmacology Department, Hospital Universitario de la Princesa, Instituto Teófilo Hernando, Instituto de Investigación Sanitaria la Princesa (IIS-IP), Universidad Autónoma de Madrid (UAM), Madrid, Spain
| |
Collapse
|
16
|
Versmée G, Versmée L, Dusenne M, Jalali N, Avillach P. dbgap2x: an R package to explore and extract data from the database of Genotypes and Phenotypes (dbGaP). Bioinformatics 2020; 36:1305-1306. [PMID: 31504194 DOI: 10.1093/bioinformatics/btz680] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2019] [Revised: 07/26/2019] [Accepted: 08/27/2019] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Based on the Genomic Data Sharing Policy issued in August 2007, the National Institutes of Health (NIH) has supported several repositories such as the database of Genotypes and Phenotypes (dbGaP). dbGaP is an online repository that provides access to large-scale genetic and phenotypic datasets with more than 1000 studies. However, navigating the website and understanding the relationship between the studies are not easy tasks. Moreover, the decryption of the files is a complex procedure. In this study we propose the dbgap2x R package that covers a broad range of functions for searching dbGaP studies, exploring the characteristics of a study and easily decrypting the files from dbGaP. AVAILABILITY AND IMPLEMENTATION dbgap2x is an R package with the code available at https://github.com/gversmee/dbgap2x. A containerized version including the package, a Jupyter server and with a Notebook example is available at https://hub.docker.com/r/gversmee/dbgap2x. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Grégoire Versmée
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Laura Versmée
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Mikaël Dusenne
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Niloofar Jalali
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
17
|
Zhang W, Zhang H, Yang H, Li M, Xie Z, Li W. Computational resources associating diseases with genotypes, phenotypes and exposures. Brief Bioinform 2020; 20:2098-2115. [PMID: 30102366 PMCID: PMC6954426 DOI: 10.1093/bib/bby071] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/01/2018] [Indexed: 12/16/2022] Open
Abstract
The causes of a disease and its therapies are not only related to genotypes, but also associated with other factors, including phenotypes, environmental exposures, drugs and chemical molecules. Distinguishing disease-related factors from many neutral factors is critical as well as difficult. Over the past two decades, bioinformaticians have developed many computational resources to integrate the omics data and discover associations among these factors. However, researchers and clinicians are experiencing difficulties in choosing appropriate resources from hundreds of relevant databases and software tools. Here, in order to assist the researchers and clinicians, we systematically review the public computational resources of human diseases related to genotypes, phenotypes, environment factors, drugs and chemical exposures. We briefly describe the development history of these computational resources, followed by the details of the relevant databases and software tools. We finally conclude with a discussion of current challenges and future opportunities as well as prospects on this topic.
Collapse
Affiliation(s)
- Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Huan Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Zhi Xie
- State Key Lab of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 500040, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| |
Collapse
|
18
|
Conte N, Mason JC, Halmagyi C, Neuhauser S, Mosaku A, Yordanova G, Chatzipli A, Begley DA, Krupke DM, Parkinson H, Meehan TF, Bult CC. PDX Finder: A portal for patient-derived tumor xenograft model discovery. Nucleic Acids Res 2020; 47:D1073-D1079. [PMID: 30535239 PMCID: PMC6323912 DOI: 10.1093/nar/gky984] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 11/30/2018] [Indexed: 11/12/2022] Open
Abstract
Patient-derived tumor xenograft (PDX) mouse models are a versatile oncology research platform for studying tumor biology and for testing chemotherapeutic approaches tailored to genomic characteristics of individual patients’ tumors. PDX models are generated and distributed by a diverse group of academic labs, multi-institution consortia and contract research organizations. The distributed nature of PDX repositories and the use of different metadata standards for describing model characteristics presents a significant challenge to identifying PDX models relevant to specific cancer research questions. The Jackson Laboratory and EMBL-EBI are addressing these challenges by co-developing PDX Finder, a comprehensive open global catalog of PDX models and their associated datasets. Within PDX Finder, model attributes are harmonized and integrated using a previously developed community minimal information standard to support consistent searching across the originating resources. Links to repositories are provided from the PDX Finder search results to facilitate model acquisition and/or collaboration. The PDX Finder resource currently contains information for 1985 PDX models of diverse cancers including those from large resources such as the Patient-Derived Models Repository, PDXNet and EurOPDX. Individuals or organizations that generate and distribute PDXs are invited to increase the ‘findability’ of their models by participating in the PDX Finder initiative at www.pdxfinder.org.
Collapse
Affiliation(s)
- Nathalie Conte
- European Molecular Biology Laboratory- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jeremy C Mason
- European Molecular Biology Laboratory- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Csaba Halmagyi
- European Molecular Biology Laboratory- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Steven Neuhauser
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - Abayomi Mosaku
- European Molecular Biology Laboratory- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Galabina Yordanova
- European Molecular Biology Laboratory- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aikaterini Chatzipli
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Dale A Begley
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - Debra M Krupke
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - Helen Parkinson
- European Molecular Biology Laboratory- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Terrence F Meehan
- European Molecular Biology Laboratory- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carol C Bult
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| |
Collapse
|
19
|
Kodama Y, Mashima J, Kosuge T, Ogasawara O. DDBJ update: the Genomic Expression Archive (GEA) for functional genomics data. Nucleic Acids Res 2020; 47:D69-D73. [PMID: 30357349 PMCID: PMC6323915 DOI: 10.1093/nar/gky1002] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/09/2018] [Indexed: 12/13/2022] Open
Abstract
The Genomic Expression Archive (GEA) for functional genomics data from microarray and high-throughput sequencing experiments has been established at the DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp), which is a member of the International Nucleotide Sequence Database Collaboration (INSDC) with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The DDBJ Center collects nucleotide sequence data and associated biological information from researchers and also services the Japanese Genotype–phenotype Archive (JGA) with the National Bioscience Database Center for collecting human data. To automate the submission process, we have implemented the DDBJ BioSample validator which checks submitted records, auto-corrects their format, and issues error messages and warnings if necessary. The DDBJ Center also operates the NIG supercomputer, prepared for analyzing large-scale genome sequences. We now offer a secure platform specifically to handle personal human genomes. This report describes database activities for INSDC and JGA over the past year, the newly launched GEA, submission, retrieval, and analysis services available in our supercomputer system and their recent developments.
Collapse
Affiliation(s)
- Yuichi Kodama
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Jun Mashima
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| |
Collapse
|
20
|
Lu X, Ding Y, Bai Y, Li J, Zhang G, Wang S, Gao W, Xu L, Wang H. Detection of Allosteric Effects of lncRNA Secondary Structures Altered by SNPs in Human Diseases. Front Cell Dev Biol 2020; 8:242. [PMID: 32322582 PMCID: PMC7156602 DOI: 10.3389/fcell.2020.00242] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 03/23/2020] [Indexed: 12/19/2022] Open
Abstract
Recent studies have shown that structuralized long non-coding RNAs (lncRNAs) play important roles in genetic and epigenetic processes. The spatial structures of most lncRNAs can be altered by distinct in vivo and in vitro cellular environments, as well as by DNA structural variations, such as single-nucleotide polymorphisms (SNPs) and variants (SNVs). In the present study, we extended candidate SNPs that had linkage disequilibria with those significantly associated with lung diseases in genome-wide association studies in order to investigate potential disease mechanisms originating from SNP structural changes of host lncRNAs. Following accurate alignments, we recognized 115 ternary-relationship pairs among 41 SNPs, 10 lncRNA transcripts, and 1 type of lung disease (adenocarcinoma of the lung). Then, we evaluated the structural heterogeneity induced by SNP alleles by developing a local-RNA-structure alignment algorithm and employing randomized strategies to determine the significance of structural variation. We identified four ternary-relationship pairs that were significantly associated with SNP-induced lncRNA allosteric effects. Moreover, these conformational changes disrupted the interactive regions and binding affinities of lncRNA-HCG23 and TF-E2F6, suggesting that these may represent regulatory mechanisms in lung diseases. Taken together, our findings support that SNP-induced changes in lncRNA conformations regulate many biological processes, providing novel insight into the role of the lncRNA “structurome” in human diseases.
Collapse
Affiliation(s)
- Xiaoyan Lu
- School of Ophthalmology and Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yu Ding
- School of Ophthalmology and Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yu Bai
- School of Ophthalmology and Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jing Li
- School of Ophthalmology and Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Guosi Zhang
- School of Ophthalmology and Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China
| | - Siyu Wang
- School of Ophthalmology and Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China
| | - Wenyan Gao
- School of Ophthalmology and Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liangde Xu
- School of Ophthalmology and Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hong Wang
- School of Ophthalmology and Optometry and Eye Hospital, School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China.,College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
21
|
Acosta JN, Brown SC, Falcone GJ. Genetic Variation and Response to Neurocritical Illness: a Powerful Approach to Identify Novel Pathophysiological Mechanisms and Therapeutic Targets. Neurotherapeutics 2020; 17:581-592. [PMID: 31975153 PMCID: PMC7283396 DOI: 10.1007/s13311-020-00837-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Disease-specific therapeutic options for critically ill neurological patients are limited. The identification of new preventive, therapeutic, and rehabilitation strategies is of the utmost importance in the field of neurocritical care research. Population genetics offers powerful tools to identify and prioritize biological pathways to be targeted by novel interventions. New treatments with supportive genetic evidence have twice the chances of obtaining final FDA approval compared to those without this support. Large collaborations, public access to data, reproducible science, and innovative analytical methods have exponentially increased the pace of discoveries related to neurocritical care genetics.
Collapse
Affiliation(s)
- Julián N Acosta
- Division of Neurocritical Care and Emergency Neurology, Department of Neurology, Yale School of Medicine, New Haven, Connecticut, 06520, USA
| | - Stacy C Brown
- Division of Neurocritical Care and Emergency Neurology, Department of Neurology, Yale School of Medicine, New Haven, Connecticut, 06520, USA
| | - Guido J Falcone
- Division of Neurocritical Care and Emergency Neurology, Department of Neurology, Yale School of Medicine, New Haven, Connecticut, 06520, USA.
| |
Collapse
|
22
|
Sadato D, Ogawa M, Hirama C, Hishima T, Horiguchi S, Harada Y, Shimoyama T, Itokawa M, Ohashi K, Oboki K. Potential prognostic impact of EBV RNA-seq reads in gastric cancer: a reanalysis of The Cancer Genome Atlas cohort. FEBS Open Bio 2020; 10:455-467. [PMID: 31991047 PMCID: PMC7050242 DOI: 10.1002/2211-5463.12803] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 01/21/2020] [Accepted: 01/24/2020] [Indexed: 01/18/2023] Open
Abstract
Epstein-Barr virus (EBV)-associated gastric cancer (EBVaGC), whose prognosis remains controversial, is diagnosed by in situ hybridization of EBV-derived EBER1/2 small RNAs. In The Cancer Genome Atlas (TCGA) Stomach Adenocarcinoma (STAD) project, the EBV molecular subtype was determined through a combination of multiple next-generation sequencing methods, but not by the gold standard in situ hybridization method. This leaves unanswered questions regarding the discordance of EBV positivity detected by different approaches and the threshold of sequencing reads. Therefore, we reanalyzed the TCGA-STAD RNA sequencing (RNA-seq) dataset including 375 tumor and 32 normal samples, using our analysis pipeline. We defined a reliable threshold for EBV-derived next-generation sequencing reads by mapping them to the EBV genome with three different random arbitrary alignments. We analyzed the prognostic impact of EBV status on the histopathological subtypes of gastric cancer. EBV-positive cases identified by reanalysis comprised nearly half of the cases (49.6%) independent from infiltrating lymphocyte signatures, and showed significantly longer overall survival for adenocarcinomas of the 'not-otherwise-specified' type [P = 0.016 (log-rank test); hazard ratios (HR): 0.476; 95% CI: 0.260-0.870, P = 0.016 (Cox univariate analysis)], but shorter overall survival for the tubular adenocarcinoma type [P = 0.005 (log-rank test); HR: 3.329; 95% CI: 1.406-7.885, P = 0.006 (Cox univariate analysis)]. These results demonstrate that the EBV positivity rates were higher when determined by RNA-seq than when determined by EBER1/2 in situ hybridization. The RNA-seq-based EBV positivity demonstrated distinct results for gastric cancer prognosis depending on the histopathological subtype, suggesting its potential to be used in clinical prognoses.
Collapse
Affiliation(s)
- Daichi Sadato
- Division of HematologyTokyo Metropolitan Cancer and Infectious Diseases Center Komagome HospitalBunkyo‐kuJapan
- Center for Medical Research CooperationTokyo Metropolitan Institute of Medical ScienceSetagaya‐kuJapan
- Divisions of Clinical Research SupportTokyo Metropolitan Cancer and Infectious Diseases Center Komagome HospitalBunkyo-kuJapan
| | - Mina Ogawa
- Center for Medical Research CooperationTokyo Metropolitan Institute of Medical ScienceSetagaya‐kuJapan
- Divisions of Clinical Research SupportTokyo Metropolitan Cancer and Infectious Diseases Center Komagome HospitalBunkyo-kuJapan
- Department of Medical OncologyTokyo Metropolitan Cancer and Infectious Diseases Center Komagome HospitalBunkyo‐kuJapan
| | - Chizuko Hirama
- Division of HematologyTokyo Metropolitan Cancer and Infectious Diseases Center Komagome HospitalBunkyo‐kuJapan
- Center for Medical Research CooperationTokyo Metropolitan Institute of Medical ScienceSetagaya‐kuJapan
- Divisions of Clinical Research SupportTokyo Metropolitan Cancer and Infectious Diseases Center Komagome HospitalBunkyo-kuJapan
| | - Tsunekazu Hishima
- Department of PathologyTokyo Metropolitan Cancer and Infectious Diseases Center Komagome HospitalBunkyo‐kuJapan
| | - Shin‐Ichiro Horiguchi
- Department of PathologyTokyo Metropolitan Cancer and Infectious Diseases Center Komagome HospitalBunkyo‐kuJapan
| | - Yuka Harada
- Divisions of Clinical Research SupportTokyo Metropolitan Cancer and Infectious Diseases Center Komagome HospitalBunkyo-kuJapan
| | - Tatsu Shimoyama
- Department of Medical OncologyTokyo Metropolitan Cancer and Infectious Diseases Center Komagome HospitalBunkyo‐kuJapan
| | - Masanari Itokawa
- Center for Medical Research CooperationTokyo Metropolitan Institute of Medical ScienceSetagaya‐kuJapan
| | - Kazuteru Ohashi
- Division of HematologyTokyo Metropolitan Cancer and Infectious Diseases Center Komagome HospitalBunkyo‐kuJapan
| | - Keisuke Oboki
- Center for Medical Research CooperationTokyo Metropolitan Institute of Medical ScienceSetagaya‐kuJapan
| |
Collapse
|
23
|
Ogasawara O, Kodama Y, Mashima J, Kosuge T, Fujisawa T. DDBJ Database updates and computational infrastructure enhancement. Nucleic Acids Res 2020; 48:D45-D50. [PMID: 31724722 PMCID: PMC7145692 DOI: 10.1093/nar/gkz982] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 10/10/2019] [Accepted: 10/21/2019] [Indexed: 12/30/2022] Open
Abstract
The Bioinformation and DDBJ Center (https://www.ddbj.nig.ac.jp) in the National Institute of Genetics (NIG) maintains a primary nucleotide sequence database as a member of the International Nucleotide Sequence Database Collaboration (INSDC) in partnership with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The NIG operates the NIG supercomputer as a computational basis for the construction of DDBJ databases and as a large-scale computational resource for Japanese biologists and medical researchers. In order to accommodate the rapidly growing amount of deoxyribonucleic acid (DNA) nucleotide sequence data, NIG replaced its supercomputer system, which is designed for big data analysis of genome data, in early 2019. The new system is equipped with 30 PB of DNA data archiving storage; large-scale parallel distributed file systems (13.8 PB in total) and 1.1 PFLOPS computation nodes and graphics processing units (GPUs). Moreover, as a starting point of developing multi-cloud infrastructure of bioinformatics, we have also installed an automatic file transfer system that allows users to prevent data lock-in and to achieve cost/performance balance by exploiting the most suitable environment from among the supercomputer and public clouds for different workloads.
Collapse
Affiliation(s)
- Osamu Ogasawara
- The Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Yuichi Kodama
- The Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Jun Mashima
- The Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Takehide Kosuge
- The Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Takatomo Fujisawa
- The Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| |
Collapse
|
24
|
Abstract
The Canadian Genomics Partnership for Rare Diseases, spearheaded by Genome Canada, will integrate genome-wide sequencing to rare disease clinical care in Canada. Centralized and tiered models of data stewardship are proposed to ensure that the data generated can be shared for secondary clinical, research, and quality assurance purposes in compliance with ethics and law. The principal ethico-legal obligations of clinicians, researchers, and institutions are synthesized. Governance infrastructures such as registered access platforms, data access compliance offices, and Beacon systems are proposed as potential organizational and technical foundations of responsible rare disease data sharing. The appropriate delegation of responsibilities, the transparent communication of rights and duties, and the integration of data privacy safeguards into infrastructure design are proposed as the cornerstones of rare disease data stewardship.
Collapse
Affiliation(s)
- Alexander Bernier
- Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, QC H3A 0G1, Canada
| |
Collapse
|
25
|
Jung JH, Hwang J, Kim JH, Sim DY, Im E, Park JE, Park WY, Shim BS, Kim B, Kim SH. Phyotochemical candidates repurposing for cancer therapy and their molecular mechanisms. Semin Cancer Biol 2019; 68:164-174. [PMID: 31883914 DOI: 10.1016/j.semcancer.2019.12.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 11/18/2019] [Accepted: 12/15/2019] [Indexed: 12/24/2022]
Abstract
Though limited success through chemotherapy, radiotherapy and surgery has been obtained for efficient cancer therapy for modern decades, cancers are still considered high burden to human health worldwide to date. Recently repurposing drugs are attractive with lower cost and shorter time compared to classical drug discovery, just as Metformin from Galega officinalis, originally approved for treating Type 2 diabetes by FDA, is globally valued at millions of US dollars for cancer therapy. As most previous reviews focused on FDA approved drugs and synthetic agents, current review discussed the anticancer potential of phytochemicals originally approved for treatment of cardiovascular diseases, diabetes, infectious diarrhea, depression and malaria with their molecular mechanisms and efficacies and suggested future research perspectives.
Collapse
Affiliation(s)
- Ji Hoon Jung
- Cancer Molecular Target Herbal Research Laboratory, College of Korean Medicine, Seoul 02447, Republic of Korea
| | - Jisung Hwang
- Cancer Molecular Target Herbal Research Laboratory, College of Korean Medicine, Seoul 02447, Republic of Korea
| | - Ju-Ha Kim
- Cancer Molecular Target Herbal Research Laboratory, College of Korean Medicine, Seoul 02447, Republic of Korea
| | - Deok Yong Sim
- Cancer Molecular Target Herbal Research Laboratory, College of Korean Medicine, Seoul 02447, Republic of Korea
| | - Eunji Im
- Cancer Molecular Target Herbal Research Laboratory, College of Korean Medicine, Seoul 02447, Republic of Korea
| | - Ji Eon Park
- Cancer Molecular Target Herbal Research Laboratory, College of Korean Medicine, Seoul 02447, Republic of Korea
| | - Woon Yi Park
- Cancer Molecular Target Herbal Research Laboratory, College of Korean Medicine, Seoul 02447, Republic of Korea
| | - Bum-Sang Shim
- Cancer Molecular Target Herbal Research Laboratory, College of Korean Medicine, Seoul 02447, Republic of Korea
| | - Bonglee Kim
- Cancer Molecular Target Herbal Research Laboratory, College of Korean Medicine, Seoul 02447, Republic of Korea
| | - Sung-Hoon Kim
- Cancer Molecular Target Herbal Research Laboratory, College of Korean Medicine, Seoul 02447, Republic of Korea.
| |
Collapse
|
26
|
Rasnic R, Brandes N, Zuk O, Linial M. Substantial batch effects in TCGA exome sequences undermine pan-cancer analysis of germline variants. BMC Cancer 2019; 19:783. [PMID: 31391007 PMCID: PMC6686424 DOI: 10.1186/s12885-019-5994-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Accepted: 07/30/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND In recent years, research on cancer predisposition germline variants has emerged as a prominent field. The identity of somatic mutations is based on a reliable mapping of the patient germline variants. In addition, the statistics of germline variants frequencies in healthy individuals and cancer patients is the basis for seeking candidates for cancer predisposition genes. The Cancer Genome Atlas (TCGA) is one of the main sources of such data, providing a diverse collection of molecular data including deep sequencing for more than 30 types of cancer from > 10,000 patients. METHODS Our hypothesis in this study is that whole exome sequences from blood samples of cancer patients are not expected to show systematic differences among cancer types. To test this hypothesis, we analyzed common and rare germline variants across six cancer types, covering 2241 samples from TCGA. In our analysis we accounted for inherent variables in the data including the different variant calling protocols, sequencing platforms, and ethnicity. RESULTS We report on substantial batch effects in germline variants associated with cancer types. We attribute the effect to the specific sequencing centers that produced the data. Specifically, we measured 30% variability in the number of reported germline variants per sample across sequencing centers. The batch effect is further expressed in nucleotide composition and variant frequencies. Importantly, the batch effect causes substantial differences in germline variant distribution patterns across numerous genes, including prominent cancer predisposition genes such as BRCA1, RET, MAX, and KRAS. For most of known cancer predisposition genes, we found a distinct batch-dependent difference in germline variants. CONCLUSION TCGA germline data is exposed to strong batch effects with substantial variabilities among TCGA sequencing centers. We claim that those batch effects are consequential for numerous TCGA pan-cancer studies. In particular, these effects may compromise the reliability and the potency to detect new cancer predisposition genes. Furthermore, interpretation of pan-cancer analyses should be revisited in view of the source of the genomic data after accounting for the reported batch effects.
Collapse
Affiliation(s)
- Roni Rasnic
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | - Nadav Brandes
- The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Or Zuk
- Department of Statistics, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
27
|
Kodama Y, Mashima J, Kosuge T, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y, Takagi T. DNA Data Bank of Japan: 30th anniversary. Nucleic Acids Res 2019; 46:D30-D35. [PMID: 29040613 PMCID: PMC5753283 DOI: 10.1093/nar/gkx926] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/02/2017] [Indexed: 11/17/2022] Open
Abstract
The DNA Data Bank of Japan (DDBJ) Center (http://www.ddbj.nig.ac.jp) has been providing public data services for 30 years since 1987. We are collecting nucleotide sequence data and associated biological information from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC), in collaboration with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The DDBJ Center also services the Japanese Genotype-phenotype Archive (JGA) with the National Bioscience Database Center to collect genotype and phenotype data of human individuals. Here, we outline our database activities for INSDC and JGA over the past year, and introduce submission, retrieval and analysis services running on our supercomputer system and their recent developments. Furthermore, we highlight our responses to the amended Japanese rules for the protection of personal information and the launch of the DDBJ Group Cloud service for sharing pre-publication data among research groups.
Collapse
Affiliation(s)
- Yuichi Kodama
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Jun Mashima
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Eli Kaminuma
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Kousaku Okubo
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Yasukazu Nakamura
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan
| | - Toshihisa Takagi
- DDBJ Center, National Institute of Genetics, Shizuoka 411-8540, Japan.,National Bioscience Database Center, Japan Science and Technology Agency, Tokyo 102-8666, Japan
| |
Collapse
|
28
|
Lin HY, Callan CY, Fang Z, Tung HY, Park JY. Interactions of PVT1 and CASC11 on Prostate Cancer Risk in African Americans. Cancer Epidemiol Biomarkers Prev 2019; 28:1067-1075. [PMID: 30914434 DOI: 10.1158/1055-9965.epi-18-1092] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Revised: 01/09/2019] [Accepted: 03/21/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND African American (AA) men have a higher risk of developing prostate cancer than white men. SNPs are known to play an important role in developing prostate cancer. The impact of PVT1 and its neighborhood genes (CASC11 and MYC) on prostate cancer risk are getting more attention recently. The interactions among these three genes associated with prostate cancer risk are understudied, especially for AA men. The objective of this study is to investigate SNP-SNP interactions in the CASC11-MYC-PVT1 region associated with prostate cancer risk in AA men. METHODS We evaluated 205 SNPs using the 2,253 prostate cancer patients and 2,423 controls and applied multiphase (discovery-validation) design. In addition to SNP individual effects, SNP-SNP interactions were evaluated using the SNP Interaction Pattern Identifier, which assesses 45 patterns. RESULTS Three SNPs (rs9642880, rs16902359, and rs12680047) and 79 SNP-SNP pairs were significantly associated with prostate cancer risk. These two SNPs (rs16902359 and rs9642880) in CASC11 interacted frequently with other SNPs with 56 and 9 pairs, respectively. We identified the novel interaction of CASC11-PVT1, which is the most common gene interaction (70%) in the top 79 pairs. Several top SNP interactions have a moderate to large effect size (OR, 0.27-0.68) and have a higher prediction power to prostate cancer risk than SNP individual effects. CONCLUSIONS Novel SNP-SNP interactions in the CASC11-MYC-PVT1 region have a larger impact than SNP individual effects on prostate cancer risk in AA men. IMPACT This gene-gene interaction between CASC11 and PVT1 can provide valuable information to reveal potential biological mechanisms of prostate cancer development.
Collapse
Affiliation(s)
- Hui-Yi Lin
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana.
| | - Catherine Y Callan
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana
| | - Zhide Fang
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana
| | - Heng-Yuan Tung
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana
| | - Jong Y Park
- Department of Cancer Epidemiology, Moffitt Cancer Center and Research Institute, Tampa, Florida
| |
Collapse
|
29
|
Dyke SOM, Linden M, Lappalainen I, De Argila JR, Carey K, Lloyd D, Spalding JD, Cabili MN, Kerry G, Foreman J, Cutts T, Shabani M, Rodriguez LL, Haeussler M, Walsh B, Jiang X, Wang S, Perrett D, Boughtwood T, Matern A, Brookes AJ, Cupak M, Fiume M, Pandya R, Tulchinsky I, Scollen S, Törnroos J, Das S, Evans AC, Malin BA, Beck S, Brenner SE, Nyrönen T, Blomberg N, Firth HV, Hurles M, Philippakis AA, Rätsch G, Brudno M, Boycott KM, Rehm HL, Baudis M, Sherry ST, Kato K, Knoppers BM, Baker D, Flicek P. Registered access: authorizing data access. Eur J Hum Genet 2018; 26:1721-1731. [PMID: 30069064 PMCID: PMC6244209 DOI: 10.1038/s41431-018-0219-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 05/08/2018] [Accepted: 06/20/2018] [Indexed: 12/14/2022] Open
Abstract
The Global Alliance for Genomics and Health (GA4GH) proposes a data access policy model-"registered access"-to increase and improve access to data requiring an agreement to basic terms and conditions, such as the use of DNA sequence and health data in research. A registered access policy would enable a range of categories of users to gain access, starting with researchers and clinical care professionals. It would also facilitate general use and reuse of data but within the bounds of consent restrictions and other ethical obligations. In piloting registered access with the Scientific Demonstration data sharing projects of GA4GH, we provide additional ethics, policy and technical guidance to facilitate the implementation of this access model in an international setting.
Collapse
Affiliation(s)
- Stephanie O M Dyke
- Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, QC, Canada.
- Montreal Neurological Institute, Faculty of Medicine, McGill University, Montreal, QC, Canada.
| | - Mikael Linden
- CSC - IT Center for Science, Espoo, Finland
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ilkka Lappalainen
- CSC - IT Center for Science, Espoo, Finland
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Jordi Rambla De Argila
- Centre for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | | | - David Lloyd
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- The Global Alliance for Genomics and Health, MaRS Centre, West Tower, 661 University Avenue, Suite 510, Toronto, M5G 0A3, ON, Canada
| | - J Dylan Spalding
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Giselle Kerry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Julia Foreman
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Tim Cutts
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Mahsa Shabani
- Center for Biomedical Ethics and Law, Department of Public Health and Primary Care, University of Leuven, Leuven, Belgium
| | | | | | | | - Xiaoqian Jiang
- Department of Biomedical Informatics, UC San Diego, La Jolla, CA, USA
| | - Shuang Wang
- Department of Biomedical Informatics, UC San Diego, La Jolla, CA, USA
| | - Daniel Perrett
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Tiffany Boughtwood
- Australian Genomics Health Alliance, 50 Flemington Road, Parkville, VIC, 3052, Australia
| | | | - Anthony J Brookes
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | | | | | | | | | - Serena Scollen
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Samir Das
- McGill Centre for Integrative Neurosciences, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| | - Alan C Evans
- McGill Centre for Integrative Neurosciences, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| | | | - Stephan Beck
- UCL Cancer Institute, University College London, London, UK
| | - Steven E Brenner
- Department of Plant & Microbial Biology, University of California, Berkeley, CA, USA
| | - Tommi Nyrönen
- CSC - IT Center for Science, Espoo, Finland
- ELIXIR Compute Platform, ELIXIR, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Helen V Firth
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Matthew Hurles
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Gunnar Rätsch
- Department of Computer Science, Biomedical Informatics, ETH Zurich, Zurich, Switzerland
| | - Michael Brudno
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON, Canada
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON, Canada
| | - Heidi L Rehm
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Pathology, Brigham & Women's Hospital & Harvard Medical School, Boston, MA, USA
| | - Michael Baudis
- University of Zurich & Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Stephen T Sherry
- National Centre for Biotechnology Information, US National Library of Medicine, Bethesda, MD, USA
| | - Kazuto Kato
- Department of Biomedical Ethics and Public Policy, Graduate School of Medicine, Osaka University, Osaka, Japan
| | - Bartha M Knoppers
- Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Dixie Baker
- Martin, Blanck & Associates, Alexandria, VA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| |
Collapse
|
30
|
Griffin PC, Khadake J, LeMay KS, Lewis SE, Orchard S, Pask A, Pope B, Roessner U, Russell K, Seemann T, Treloar A, Tyagi S, Christiansen JH, Dayalan S, Gladman S, Hangartner SB, Hayden HL, Ho WWH, Keeble-Gagnère G, Korhonen PK, Neish P, Prestes PR, Richardson MF, Watson-Haigh NS, Wyres KL, Young ND, Schneider MV. Best practice data life cycle approaches for the life sciences. F1000Res 2018; 6:1618. [PMID: 30109017 DOI: 10.12688/f1000research.12344.1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/17/2017] [Indexed: 11/20/2022] Open
Abstract
Throughout history, the life sciences have been revolutionised by technological advances; in our era this is manifested by advances in instrumentation for data generation, and consequently researchers now routinely handle large amounts of heterogeneous data in digital formats. The simultaneous transitions towards biology as a data science and towards a 'life cycle' view of research data pose new challenges. Researchers face a bewildering landscape of data management requirements, recommendations and regulations, without necessarily being able to access data management training or possessing a clear understanding of practical approaches that can assist in data management in their particular research domain. Here we provide an overview of best practice data life cycle approaches for researchers in the life sciences/bioinformatics space with a particular focus on 'omics' datasets and computer-based data processing and analysis. We discuss the different stages of the data life cycle and provide practical suggestions for useful tools and resources to improve data management practices.
Collapse
Affiliation(s)
- Philippa C Griffin
- EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia.,Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Jyoti Khadake
- NIHR BioResource, University of Cambridge and Cambridge University Hospitals NHS Foundation Trust Hills Road, Cambridge , CB2 0QQ, UK
| | - Kate S LeMay
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Suzanna E Lewis
- Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology Division, Berkeley, CA, 94720, USA
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge, CB10 1SD, UK
| | - Andrew Pask
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Bernard Pope
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Ute Roessner
- Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Keith Russell
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Torsten Seemann
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Andrew Treloar
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Sonika Tyagi
- Australian Genome Research Facility Ltd, Parkville, VIC, 3052, Australia.,Monash Bioinformatics Platform, Monash University, Clayton, VIC, 3800, Australia
| | - Jeffrey H Christiansen
- Queensland Cyber Infrastructure Foundation and the University of Queensland Research Computing Centre, St Lucia, QLD, 4072, Australia
| | - Saravanan Dayalan
- Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Simon Gladman
- EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Sandra B Hangartner
- School of Biological Sciences, Monash University, Clayton, VIC, 3800, Australia
| | - Helen L Hayden
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources (DEDJTR), Bundoora, VIC, 3083, Australia
| | - William W H Ho
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Gabriel Keeble-Gagnère
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources (DEDJTR), Bundoora, VIC, 3083, Australia
| | - Pasi K Korhonen
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Peter Neish
- The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Priscilla R Prestes
- Faculty of Science and Engineering, Federation University Australia, Mt Helen , VIC, 3350, Australia
| | - Mark F Richardson
- Bioinformatics Core Research Group & Centre for Integrative Ecology, Deakin University, Geelong, VIC, 3220, Australia
| | - Nathan S Watson-Haigh
- School of Agriculture, Food and Wine, University of Adelaide, Glen Osmond, SA, 5064, Australia
| | - Kelly L Wyres
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Neil D Young
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Maria Victoria Schneider
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia.,The University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|
31
|
Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes. Sci Rep 2018; 8:8180. [PMID: 29802335 PMCID: PMC5970138 DOI: 10.1038/s41598-018-26310-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 05/10/2018] [Indexed: 12/16/2022] Open
Abstract
We applied two state-of-the-art, knowledge independent data-mining methods - Dynamic Quantum Clustering (DQC) and t-Distributed Stochastic Neighbor Embedding (t-SNE) - to data from The Cancer Genome Atlas (TCGA). We showed that the RNA expression patterns for a mixture of 2,016 samples from five tumor types can sort the tumors into groups enriched for relevant annotations including tumor type, gender, tumor stage, and ethnicity. DQC feature selection analysis discovered 48 core biomarker transcripts that clustered tumors by tumor type. When these transcripts were removed, the geometry of tumor relationships changed, but it was still possible to classify the tumors using the RNA expression profiles of the remaining transcripts. We continued to remove the top biomarkers for several iterations and performed cluster analysis. Even though the most informative transcripts were removed from the cluster analysis, the sorting ability of remaining transcripts remained strong after each iteration. Further, in some iterations we detected a repeating pattern of biological function that wasn't detectable with the core biomarker transcripts present. This suggests the existence of a "background classification" potential in which the pattern of gene expression after continued removal of "biomarker" transcripts could still classify tumors in agreement with the tumor type.
Collapse
|
32
|
Gene Variation of Endoplasmic Reticulum Aminopeptidases 1 and 2, and Risk of Blood Pressure Progression and Incident Hypertension among 17,255 Initially Healthy Women. Int J Genomics 2018; 2018:2308585. [PMID: 29850473 PMCID: PMC5933071 DOI: 10.1155/2018/2308585] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 03/14/2018] [Indexed: 02/07/2023] Open
Abstract
Recent studies have demonstrated the importance of endoplasmic reticulum aminopeptidase (ERAP) in blood pressure (BP) homeostasis. To date, no large prospective, genetic–epidemiological data are available on genetic variation within ERAP and hypertension risk. The association of 45 genetic variants of ERAP1 and ERAP2 was investigated in 17,255 Caucasian female participants from the Women's Genome Health Study. All subjects were free of hypertension at baseline. During an 18-year follow-up period, 10,216 incident hypertensive cases were identified. Multivariable linear, logistic, and Cox regression analyses were performed to assess the relationship of genotypes with baseline BP levels, BP progression at 48 months, and incident hypertension assuming an additive genetic model. Linear regression analyses showed associations of four tSNPs (ERAP1: rs27524; ERAP2: rs3733904, rs4869315, and rs2549782; all p < 0.05) with baseline systolic BP levels. Three tSNPs (ERAP1: rs27851, rs27429, and rs34736, all p < 0.05) were associated with baseline diastolic BP levels. Multivariable logistic regression analysis showed that ERAP1 rs27772 was associated with BP progression at 48 months (p = 0.0366). Multivariable Cox regression analysis showed an association of three tSNPs (ERAP1: rs469783 and rs10050860; ERAP2: rs2927615; all p < 0.05) with risk of incident hypertension. Analyses of dbGaP for genotype–phenotype association and GTEx Portal for gene expression quantitative trait loci revealed five tSNPs with differential association of BP and nine tSNPs with lower ERAP1 and ERAP2 mRNA expression levels, respectively. The present study suggests that ERAP1 and ERAP2 gene variation may be useful for risk assessment of BP progression and the development of hypertension.
Collapse
|
33
|
Corpas M, Kovalevskaya NV, McMurray A, Nielsen FGG. A FAIR guide for data providers to maximise sharing of human genomic data. PLoS Comput Biol 2018; 14:e1005873. [PMID: 29543799 PMCID: PMC5854239 DOI: 10.1371/journal.pcbi.1005873] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
It is generally acknowledged that, for reproducibility and progress of human genomic research, data sharing is critical. For every sharing transaction, a successful data exchange is produced between a data consumer and a data provider. Providers of human genomic data (e.g., publicly or privately funded repositories and data archives) fulfil their social contract with data donors when their shareable data conforms to FAIR (findable, accessible, interoperable, reusable) principles. Based on our experiences via Repositive (https://repositive.io), a leading discovery platform cataloguing all shared human genomic datasets, we propose guidelines for data providers wishing to maximise their shared data's FAIRness.
Collapse
Affiliation(s)
- Manuel Corpas
- Repositive Ltd, Betjeman House, Cambridge, United Kingdom
- * E-mail:
| | | | | | | |
Collapse
|
34
|
Burke W, Beskow LM, Trinidad SB, Fullerton SM, Brelsford K. Informed Consent in Translational Genomics: Insufficient Without Trustworthy Governance. THE JOURNAL OF LAW, MEDICINE & ETHICS : A JOURNAL OF THE AMERICAN SOCIETY OF LAW, MEDICINE & ETHICS 2018; 46:79-86. [PMID: 29962827 PMCID: PMC6023399 DOI: 10.1177/1073110518766023] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Neither the range of potential results from genomic research that might be returned to participants nor future uses of stored data and biospecimens can be fully predicted at the outset of a study. Informed consent procedures require clear explanations about how and by whom decisions are made and what principles and criteria apply. To ensure trustworthy research governance, there is also a need for empirical studies incorporating public input to evaluate and strengthen these processes.
Collapse
Affiliation(s)
- Wylie Burke
- Department of Bioethics and Humanities, Box 357120, University of Washington, Seattle WA 98195; Work phone: 206-221-5482; Home phone 206-232-6760; Cell phone: 206-619-3191
| | - Laura M Beskow
- Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, 2525 West End Aves, Suite 400, Nashville TN 37203; Work phone: 615-936-2686
| | - Susan Brown Trinidad
- Department of Bioethics and Humanities, Box 357120, University of Washington, Seattle WA 98195; Work phone:206-543-2508;Home phone: 206-842-9241;Cell phone: 360-850-3428
| | - Stephanie M Fullerton
- Department of Bioethics and Humanities, Box 357120, University of Washington, Seattle WA 98195; Work phone: 206-616-1864; Home phone: 206-297-1005; Cell phone: 206-529-7029
| | - Kathleen Brelsford
- Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, 2525 West End Aves, Suite 400, Nashville TN 37203; Work phone: 615-936-2686
| |
Collapse
|
35
|
Lert-Itthiporn W, Suktitipat B, Grove H, Sakuntabhai A, Malasit P, Tangthawornchaikul N, Matsuda F, Suriyaphol P. Validation of genotype imputation in Southeast Asian populations and the effect of single nucleotide polymorphism annotation on imputation outcome. BMC MEDICAL GENETICS 2018; 19:23. [PMID: 29439659 PMCID: PMC5812212 DOI: 10.1186/s12881-018-0534-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Accepted: 01/24/2018] [Indexed: 11/24/2022]
Abstract
Background Imputation involves the inference of untyped single nucleotide polymorphisms (SNPs) in genome-wide association studies. The haplotypic reference of choice for imputation in Southeast Asian populations is unclear. Moreover, the influence of SNP annotation on imputation results has not been examined. Methods This study was divided into two parts. In the first part, we applied imputation to genotyped SNPs from Southeast Asian populations from the Pan-Asian SNP database. Five percent of the total SNPs were removed. The remaining SNPs were applied to imputation with IMPUTE2. The imputed outcomes were verified with the removed SNPs. We compared imputation references from Chinese and Japanese haplotypes from the HapMap phase II (HMII) and the complete set of haplotypes from the 1000 Genomes Project (1000G). The second part was imputation accuracy and yield in Thai patient dataset. Half of the autosomal SNPs was removed to create Set 1. Another dataset, Set 2, was then created where we switched which half of the SNPs were removed. Both Set 1 and Set 2 were imputed with HMII to create a complete imputed SNPs dataset. The dataset was used to validate association testing, SNPs annotation and imputation outcome. Results The accuracy was highest for all populations when using the HMII reference, but at the cost of a lower yield. Thai genotypes showed the highest accuracy over other populations in both HMII and 1000G panels, although accuracy and yield varied across chromosomes. Imputation was tested in a clinical dataset to compare accuracy in gene-related regions, and coding regions were found to have a higher accuracy and yield. Conclusions This work provides the first evidence of imputation reference selection for Southeast Asian studies and highlights the effects of SNP locations respective to genes on imputation outcome. Researchers will need to consider the trade-off between accuracy and yield in future imputation studies. Electronic supplementary material The online version of this article (10.1186/s12881-018-0534-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Worachart Lert-Itthiporn
- Molecular Medicine Graduate Program, Faculty of Science, Mahidol University, Bangkok, Thailand.,Division of Bioinformatics and Data Management for Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Bhoom Suktitipat
- Integrative Computational BioScience Center, Department of Biochemistry, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand.,Center of Excellence in Bioinformatics and Clinical Data Management, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Harald Grove
- Division of Bioinformatics and Data Management for Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand.,Center of Excellence in Bioinformatics and Clinical Data Management, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Anavaj Sakuntabhai
- Unité de Génétique Fonctionnelle des Maladies Infectieuses, Department Genome and Genetics, Institut Pasteur, Paris, France.,Centre National de la Recherche Scientifique, URA3012, Paris, France.,Systems Biology of Diseases Research Unit, Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Prida Malasit
- Medical Biotechnology Research Unit, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Bangkok, Thailand.,Division of Dengue Hemorrhagic Fever Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Nattaya Tangthawornchaikul
- Medical Biotechnology Research Unit, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Bangkok, Thailand.,Division of Dengue Hemorrhagic Fever Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Fumihiko Matsuda
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Prapat Suriyaphol
- Division of Bioinformatics and Data Management for Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand. .,Center of Excellence in Bioinformatics and Clinical Data Management, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand.
| |
Collapse
|
36
|
Griffin PC, Khadake J, LeMay KS, Lewis SE, Orchard S, Pask A, Pope B, Roessner U, Russell K, Seemann T, Treloar A, Tyagi S, Christiansen JH, Dayalan S, Gladman S, Hangartner SB, Hayden HL, Ho WWH, Keeble-Gagnère G, Korhonen PK, Neish P, Prestes PR, Richardson MF, Watson-Haigh NS, Wyres KL, Young ND, Schneider MV. Best practice data life cycle approaches for the life sciences. F1000Res 2017; 6:1618. [PMID: 30109017 PMCID: PMC6069748 DOI: 10.12688/f1000research.12344.2] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/29/2018] [Indexed: 11/20/2022] Open
Abstract
Throughout history, the life sciences have been revolutionised by technological advances; in our era this is manifested by advances in instrumentation for data generation, and consequently researchers now routinely handle large amounts of heterogeneous data in digital formats. The simultaneous transitions towards biology as a data science and towards a 'life cycle' view of research data pose new challenges. Researchers face a bewildering landscape of data management requirements, recommendations and regulations, without necessarily being able to access data management training or possessing a clear understanding of practical approaches that can assist in data management in their particular research domain. Here we provide an overview of best practice data life cycle approaches for researchers in the life sciences/bioinformatics space with a particular focus on 'omics' datasets and computer-based data processing and analysis. We discuss the different stages of the data life cycle and provide practical suggestions for useful tools and resources to improve data management practices.
Collapse
Affiliation(s)
- Philippa C Griffin
- EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia.,Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Jyoti Khadake
- NIHR BioResource, University of Cambridge and Cambridge University Hospitals NHS Foundation Trust Hills Road, Cambridge , CB2 0QQ, UK
| | - Kate S LeMay
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Suzanna E Lewis
- Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology Division, Berkeley, CA, 94720, USA
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge, CB10 1SD, UK
| | - Andrew Pask
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Bernard Pope
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Ute Roessner
- Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Keith Russell
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Torsten Seemann
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Andrew Treloar
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Sonika Tyagi
- Australian Genome Research Facility Ltd, Parkville, VIC, 3052, Australia.,Monash Bioinformatics Platform, Monash University, Clayton, VIC, 3800, Australia
| | - Jeffrey H Christiansen
- Queensland Cyber Infrastructure Foundation and the University of Queensland Research Computing Centre, St Lucia, QLD, 4072, Australia
| | - Saravanan Dayalan
- Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Simon Gladman
- EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Sandra B Hangartner
- School of Biological Sciences, Monash University, Clayton, VIC, 3800, Australia
| | - Helen L Hayden
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources (DEDJTR), Bundoora, VIC, 3083, Australia
| | - William W H Ho
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Gabriel Keeble-Gagnère
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources (DEDJTR), Bundoora, VIC, 3083, Australia
| | - Pasi K Korhonen
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Peter Neish
- The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Priscilla R Prestes
- Faculty of Science and Engineering, Federation University Australia, Mt Helen , VIC, 3350, Australia
| | - Mark F Richardson
- Bioinformatics Core Research Group & Centre for Integrative Ecology, Deakin University, Geelong, VIC, 3220, Australia
| | - Nathan S Watson-Haigh
- School of Agriculture, Food and Wine, University of Adelaide, Glen Osmond, SA, 5064, Australia
| | - Kelly L Wyres
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Neil D Young
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Maria Victoria Schneider
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia.,The University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|