1
|
Wang M, Li F, Wu H, Liu Q, Li S. PredPromoter-MF(2L): A Novel Approach of Promoter Prediction Based on Multi-source Feature Fusion and Deep Forest. Interdiscip Sci 2022; 14:697-711. [PMID: 35488998 DOI: 10.1007/s12539-022-00520-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 04/05/2022] [Accepted: 04/05/2022] [Indexed: 12/12/2022]
Abstract
Promoters short DNA sequences play vital roles in initiating gene transcription. However, it remains a challenge to identify promoters using conventional experiment techniques in a high-throughput manner. To this end, several computational predictors based on machine learning models have been developed, while their performance is unsatisfactory. In this study, we proposed a novel two-layer predictor, called PredPromoter-MF(2L), based on multi-source feature fusion and ensemble learning. PredPromoter-MF(2L) was developed based on various deep features learned by a pre-trained deep learning network model and sequence-derived features. Feature selection based on XGBoost was applied to reduce fused features dimensions, and a cascade deep forest model was trained on the selected feature subset for promoter prediction. The results both fivefold cross-validation and independent test demonstrated that PredPromoter-MF(2L) outperformed state-of-the-art methods.
Collapse
Affiliation(s)
- Miao Wang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, VIC, 3000, Australia
| | - Hao Wu
- School of Software, Shandong University, Jinan, 250100, Shandong, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China.
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shanxi, China.
| |
Collapse
|
2
|
Silva MC, Eugénio P, Faria D, Pesquita C. Ontologies and Knowledge Graphs in Oncology Research. Cancers (Basel) 2022; 14:cancers14081906. [PMID: 35454813 PMCID: PMC9029532 DOI: 10.3390/cancers14081906] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 03/25/2022] [Accepted: 04/07/2022] [Indexed: 11/16/2022] Open
Abstract
The complexity of cancer research stems from leaning on several biomedical disciplines for relevant sources of data, many of which are complex in their own right. A holistic view of cancer—which is critical for precision medicine approaches—hinges on integrating a variety of heterogeneous data sources under a cohesive knowledge model, a role which biomedical ontologies can fill. This study reviews the application of ontologies and knowledge graphs in cancer research. In total, our review encompasses 141 published works, which we categorized under 14 hierarchical categories according to their usage of ontologies and knowledge graphs. We also review the most commonly used ontologies and newly developed ones. Our review highlights the growing traction of ontologies in biomedical research in general, and cancer research in particular. Ontologies enable data accessibility, interoperability and integration, support data analysis, facilitate data interpretation and data mining, and more recently, with the emergence of the knowledge graph paradigm, support the application of Artificial Intelligence methods to unlock new knowledge from a holistic view of the available large volumes of heterogeneous data.
Collapse
|
3
|
Ong E, Wang LL, Schaub J, O'Toole JF, Steck B, Rosenberg AZ, Dowd F, Hansen J, Barisoni L, Jain S, de Boer IH, Valerius MT, Waikar SS, Park C, Crawford DC, Alexandrov T, Anderton CR, Stoeckert C, Weng C, Diehl AD, Mungall CJ, Haendel M, Robinson PN, Himmelfarb J, Iyengar R, Kretzler M, Mooney S, He Y. Modelling kidney disease using ontology: insights from the Kidney Precision Medicine Project. Nat Rev Nephrol 2020; 16:686-696. [PMID: 32939051 PMCID: PMC8012202 DOI: 10.1038/s41581-020-00335-w] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/24/2020] [Indexed: 12/29/2022]
Abstract
An important need exists to better understand and stratify kidney disease according to its underlying pathophysiology in order to develop more precise and effective therapeutic agents. National collaborative efforts such as the Kidney Precision Medicine Project are working towards this goal through the collection and integration of large, disparate clinical, biological and imaging data from patients with kidney disease. Ontologies are powerful tools that facilitate these efforts by enabling researchers to organize and make sense of different data elements and the relationships between them. Ontologies are critical to support the types of big data analysis necessary for kidney precision medicine, where heterogeneous clinical, imaging and biopsy data from diverse sources must be combined to define a patient's phenotype. The development of two new ontologies - the Kidney Tissue Atlas Ontology and the Ontology of Precision Medicine and Investigation - will support the creation of the Kidney Tissue Atlas, which aims to provide a comprehensive molecular, cellular and anatomical map of the kidney. These ontologies will improve the annotation of kidney-relevant data, and eventually lead to new definitions of kidney disease in support of precision medicine.
Collapse
Affiliation(s)
- Edison Ong
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Lucy L Wang
- Allen Institute for Artificial Intelligence, Seattle, WA, USA
| | - Jennifer Schaub
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - John F O'Toole
- Department of Nephrology and Hypertension, Glickman Urological and Kidney Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Becky Steck
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Avi Z Rosenberg
- Department of Pathology, Johns Hopkins University, Baltimore, MD, USA
| | - Frederick Dowd
- UW Medicine Research IT, University of Washington, Seattle, WA, USA
| | - Jens Hansen
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Systems Biomedicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Laura Barisoni
- Division of AI/Computational Pathology, Department of Pathology, and Division of Nephrology, Department of Medicine, Duke University, Durham, NC, USA
| | - Sanjay Jain
- Division of Nephrology, School of Medicine, Washington University in St. Louis, St Louis, MO, USA
| | - Ian H de Boer
- Division of Nephrology, Department of Medicine, University of Washington, Seattle, WA, USA
| | - M Todd Valerius
- Division of Renal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Sushrut S Waikar
- Section of Nephrology, Boston University Medical Center, Boston, MA, USA
| | - Christopher Park
- Kidney Research Institute, University of Washington, Seattle, WA, USA
| | - Dana C Crawford
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Cleveland Institute for Computational Biology, Cleveland, OH, USA
| | - Theodore Alexandrov
- Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | | | - Christian Stoeckert
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania Philadelphia, Philadelphia, PA, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Alexander D Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Melissa Haendel
- Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jonathan Himmelfarb
- Division of Nephrology, Department of Medicine, University of Washington, Seattle, WA, USA
- Kidney Research Institute, University of Washington, Seattle, WA, USA
| | - Ravi Iyengar
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Systems Biomedicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Matthias Kretzler
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
- Division of Nephrology, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Sean Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA.
| | - Yongqun He
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI, USA.
- Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI, USA.
| |
Collapse
|
4
|
Stathias V, Turner J, Koleti A, Vidovic D, Cooper D, Fazel-Najafabadi M, Pilarczyk M, Terryn R, Chung C, Umeano A, Clarke DJB, Lachmann A, Evangelista JE, Ma’ayan A, Medvedovic M, Schürer SC. LINCS Data Portal 2.0: next generation access point for perturbation-response signatures. Nucleic Acids Res 2020; 48:D431-D439. [PMID: 31701147 PMCID: PMC7145650 DOI: 10.1093/nar/gkz1023] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/17/2019] [Accepted: 11/04/2019] [Indexed: 12/21/2022] Open
Abstract
The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program with the goal of generating a large-scale and comprehensive catalogue of perturbation-response signatures by utilizing a diverse collection of perturbations across many model systems and assay types. The LINCS Data Portal (LDP) has been the primary access point for the compendium of LINCS data and has been widely utilized. Here, we report the first major update of LDP (http://lincsportal.ccs.miami.edu/signatures) with substantial changes in the data architecture and APIs, a completely redesigned user interface, and enhanced curated metadata annotations to support more advanced, intuitive and deeper querying, exploration and analysis capabilities. The cornerstone of this update has been the decision to reprocess all high-level LINCS datasets and make them accessible at the data point level enabling users to directly access and download any subset of signatures across the entire library independent from the originating source, project or assay. Access to the individual signatures also enables the newly implemented signature search functionality, which utilizes the iLINCS platform to identify conditions that mimic or reverse gene set queries. A newly designed query interface enables global metadata search with autosuggest across all annotations associated with perturbations, model systems, and signatures.
Collapse
Affiliation(s)
- Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, USA
- Center for Computational Science, University of Miami, USA
- BD2K-LINCS Data Coordination and Integration Center, USA
| | - John Turner
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, USA
- BD2K-LINCS Data Coordination and Integration Center, USA
| | - Amar Koleti
- Center for Computational Science, University of Miami, USA
- BD2K-LINCS Data Coordination and Integration Center, USA
| | - Dusica Vidovic
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, USA
- BD2K-LINCS Data Coordination and Integration Center, USA
| | - Daniel Cooper
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, USA
- BD2K-LINCS Data Coordination and Integration Center, USA
| | - Mehdi Fazel-Najafabadi
- BD2K-LINCS Data Coordination and Integration Center, USA
- Laboratory for Statistical Genomics and Systems Biology, Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati College of Medicine, USA
| | - Marcin Pilarczyk
- BD2K-LINCS Data Coordination and Integration Center, USA
- Laboratory for Statistical Genomics and Systems Biology, Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati College of Medicine, USA
| | - Raymond Terryn
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, USA
| | - Caty Chung
- BD2K-LINCS Data Coordination and Integration Center, USA
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, USA
| | - Afoma Umeano
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, USA
| | - Daniel J B Clarke
- BD2K-LINCS Data Coordination and Integration Center, USA
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Alexander Lachmann
- BD2K-LINCS Data Coordination and Integration Center, USA
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, USA
| | - John Erol Evangelista
- BD2K-LINCS Data Coordination and Integration Center, USA
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Avi Ma’ayan
- BD2K-LINCS Data Coordination and Integration Center, USA
- Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Mario Medvedovic
- BD2K-LINCS Data Coordination and Integration Center, USA
- Laboratory for Statistical Genomics and Systems Biology, Division of Biostatistics and Bioinformatics, Department of Environmental Health, University of Cincinnati College of Medicine, USA
| | - Stephan C Schürer
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, USA
- Center for Computational Science, University of Miami, USA
- BD2K-LINCS Data Coordination and Integration Center, USA
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, USA
- To whom correspondence should be addressed. Tel: +1 305 243 6552;
| |
Collapse
|
5
|
Biomedical ontologies and their development, management, and applications in and beyond China. JOURNAL OF BIO-X RESEARCH 2019. [DOI: 10.1097/jbr.0000000000000051] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
6
|
Musa A, Tripathi S, Dehmer M, Emmert-Streib F. L1000 Viewer: A Search Engine and Web Interface for the LINCS Data Repository. Front Genet 2019; 10:557. [PMID: 31258549 PMCID: PMC6588157 DOI: 10.3389/fgene.2019.00557] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Accepted: 05/28/2019] [Indexed: 12/12/2022] Open
Abstract
The LINCS L1000 data repository contains almost two million gene expression profiles for thousands of small molecules and drugs. However, due to the complexity and the size of the data repository and a lack of an interoperable interface, the creation of pharmacologically meaningful workflows utilizing these data is severely hampered. In order to overcome this limitation, we developed the L1000 Viewer, a search engine and graphical web interface for the LINCS data repository. The web interface serves as an interactive platform allowing the user to select different forms of perturbation profiles, e.g., for specific cell lines, drugs, dosages, time points and combinations thereof. At its core, our method has a database we created from inferring and utilizing the intricate dependency graph structure among the data files. The L1000 Viewer is accessible via http://L1000viewer.bio-complexity.com/.
Collapse
Affiliation(s)
- Aliyu Musa
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland.,Institute of Biosciences and Medical Technology, Tampere, Finland
| | - Shailesh Tripathi
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland.,Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Linz, Austria
| | - Matthias Dehmer
- Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Linz, Austria.,Department of Mechatronics and Biomedical Computer Science, UMIT, Hall in Tyrol, Austria.,College of Computer and Control Engineering, Nankai University, Tianjin, China
| | - Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland.,Institute of Biosciences and Medical Technology, Tampere, Finland
| |
Collapse
|
7
|
Pan H, Bian X, Yang S, He Y, Yang X, Liu Y. The cell line ontology-based representation, integration and analysis of cell lines used in China. BMC Bioinformatics 2019; 20:179. [PMID: 31272367 PMCID: PMC6509802 DOI: 10.1186/s12859-019-2724-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Chinese National Infrastructure of Cell Line stores and distributes cell lines for biomedical research in China. This study aims to represent and integrate the information of NICR cell lines into the community-based Cell Line Ontology (CLO). RESULTS We have aligned, represented, and added all identified 2704 cell line cells in NICR to CLO. We also proposed new ontology design patterns to represent the usage of cell line cells as disease models by inducing tumor formation in model organisms, and the relations between cell line cells and their expressed or overexpressed genes or proteins. The resulting CLO-NICR ontology also includes the Chinese representation of the NICR cell line information. CLO-NICR was merged into the general CLO. To serve the cell research community in China, the Chinese version of CLO-NICR was also generated and deposited in the OntoChina ontology repository. The usage of CLO-NICR was demonstrated by DL query and knowledge extraction. CONCLUSIONS In summary, all identified cell lines from NICR are represented by the semantics framework of CLO and incorporated into CLO as a most recent update. We also generated a CLO-NICR and its Chinese view (CLO-NICR-Cv). The development of CLO-NICR and CLO-NIC-Cv allows the integration of the cell lines from NICR into the community-based CLO ontology and provides an integrative platform to support different applications of CLO in China.
Collapse
Affiliation(s)
- Hongjie Pan
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Xiaocui Bian
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Sheng Yang
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
| | - Xiaolin Yang
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Yuqin Liu
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
8
|
Sarntivijai S, He Y, Diehl AD. Cells in ExperimentaL Life Sciences (CELLS-2018): capturing the knowledge of normal and diseased cells with ontologies. BMC Bioinformatics 2019; 20:183. [PMID: 31272374 PMCID: PMC6509796 DOI: 10.1186/s12859-019-2721-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Cell cultures and cell lines are widely used in life science experiments. In conjunction with the 2018 International Conference on Biomedical Ontology (ICBO-2018), the 2nd International Workshop on Cells in ExperimentaL Life Science (CELLS-2018) focused on two themes of knowledge representation, for newly-discovered cell types and for cells in disease states. This workshop included five oral presentations and a general discussion session. Two new ontologies, including the Cancer Cell Ontology (CCL) and the Ontology for Stem Cell Investigations (OSCI), were reported in the workshop. In another representation, the Cell Line Ontology (CLO) framework was applied and extended to represent cell line cells used in China and their Chinese representation. Other presentations included a report on the application of ontologies to cross-compare cell types and marker patterns used in flow cytometry studies, and a presentation on new experimental findings about novel cell types based on single cell RNA sequencing assay and their corresponding ontological representation. The general discussion session focused on the ontology design patterns in representing newly-discovered cell types and cells in disease states.
Collapse
Affiliation(s)
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, MI USA
| | - Alexander D. Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
| |
Collapse
|
9
|
Musa A, Tripathi S, Kandhavelu M, Dehmer M, Emmert-Streib F. Harnessing the biological complexity of Big Data from LINCS gene expression signatures. PLoS One 2018; 13:e0201937. [PMID: 30157183 PMCID: PMC6114505 DOI: 10.1371/journal.pone.0201937] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 07/24/2018] [Indexed: 01/02/2023] Open
Abstract
Gene expression profiling using transcriptional drug perturbations are useful for many biomedical discovery studies including drug repurposing and elucidation of drug mechanisms (MoA) and many other pharmacogenomic applications. However, limited data availability across cell types has severely hindered our capacity to progress in these areas. To fill this gap, recently, the LINCS program generated almost 1.3 million profiles for over 40,000 drug and genetic perturbations for over 70 different human cell types, including meta information about the experimental conditions and cell lines. Unfortunately, Big Data like the ones generated from the ongoing LINCS program do not enable easy insights from the data but possess considerable challenges toward their analysis. In this paper, we address some of these challenges. Specifically, first, we study the gene expression signature profiles from all cell lines and their perturbagents in order to obtain insights in the distributional characteristics of available conditions. Second, we investigate the differential expression of genes for all cell lines obtaining an understanding of condition dependent differential expression manifesting the biological complexity of perturbagents. As a result, our analysis helps the experimental design of follow-up studies, e.g., by selecting appropriate cell lines.
Collapse
Affiliation(s)
- Aliyu Musa
- Predictive Medicine and Data Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
- Molecular Signaling Lab, Faculty of Biomedical Sciences and Engineering, Tampere University of Technology, Tampere, Finland
| | - Shailesh Tripathi
- Predictive Medicine and Data Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
- University of Applied Sciences Upper Austria, Steyr, Austria
| | - Meenakshisundaram Kandhavelu
- Molecular Signaling Lab, Faculty of Biomedical Sciences and Engineering, Tampere University of Technology, Tampere, Finland
- BioMediTech Institute, Tampere University of Technology, Tampere, Finland
| | - Matthias Dehmer
- University of Applied Sciences Upper Austria, Steyr, Austria
- Institute for Bioinformatics and Translational Research, UMIT- The Health and Life Sciences University, Hall in Tyrol, Austria
| | - Frank Emmert-Streib
- Predictive Medicine and Data Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
- Molecular Signaling Lab, Faculty of Biomedical Sciences and Engineering, Tampere University of Technology, Tampere, Finland
| |
Collapse
|
10
|
Sustainable data and metadata management at the BD2K-LINCS Data Coordination and Integration Center. Sci Data 2018; 5:180117. [PMID: 29917015 PMCID: PMC6007090 DOI: 10.1038/sdata.2018.117] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 05/11/2018] [Indexed: 12/18/2022] Open
Abstract
The NIH-funded LINCS Consortium is creating an extensive reference library of cell-based perturbation response signatures and sophisticated informatics tools incorporating a large number of perturbagens, model systems, and assays. To date, more than 350 datasets have been generated including transcriptomics, proteomics, epigenomics, cell phenotype and competitive binding profiling assays. The large volume and variety of data necessitate rigorous data standards and effective data management including modular data processing pipelines and end-user interfaces to facilitate accurate and reliable data exchange, curation, validation, standardization, aggregation, integration, and end user access. Deep metadata annotations and the use of qualified data standards enable integration with many external resources. Here we describe the end-to-end data processing and management at the DCIC to generate a high-quality and persistent product. Our data management and stewardship solutions enable a functioning Consortium and make LINCS a valuable scientific resource that aligns with big data initiatives such as the BD2K NIH Program and concords with emerging data science best practices including the findable, accessible, interoperable, and reusable (FAIR) principles.
Collapse
|