1
|
Paganelli M, Sottovia P, Park K, Interlandi M, Guerra F. Pushing ML Predictions Into DBMSs. IEEE Trans Knowl Data Eng 2023; 35:10295-10308. [PMID: 37954972 PMCID: PMC10620958 DOI: 10.1109/tkde.2023.3269592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 02/12/2023] [Accepted: 04/08/2023] [Indexed: 11/14/2023]
Abstract
In the past decade, many approaches have been suggested to execute ML workloads on a DBMS. However, most of them have looked at in-DBMS ML from a training perspective, whereas ML inference has been largely overlooked. We think that this is an important gap to fill for two main reasons: (1) in the near future, every application will be infused with some sort of ML capability; (2) behind every web page, application, and enterprise there is a DBMS, whereby in-DBMS inference is an appealing solution both for efficiency (e.g., less data movement), performance (e.g., cross-optimizations between relational operators and ML) and governance. In this article, we study whether DBMSs are a good fit for prediction serving. We introduce a technique for translating trained ML pipelines containing both featurizers (e.g., one-hot encoding) and models (e.g., linear and tree-based models) into SQL queries, and we compare in-DBMS performance against popular ML frameworks such as Sklearn and ml.net. Our experiments show that, when pushed inside a DBMS, trained ML pipelines can have performance comparable to ML frameworks in several scenarios, while they perform quite poorly on text featurization and over (even simple) neural networks.
Collapse
|
2
|
Al-Aamri A, Kamarul Azman S, Daw Elbait G, Alsafar H, Henschel A. Critical assessment of on-premise approaches to scalable genome analysis. BMC Bioinformatics 2023; 24:354. [PMID: 37735350 PMCID: PMC10512525 DOI: 10.1186/s12859-023-05470-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 09/08/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Plummeting DNA sequencing cost in recent years has enabled genome sequencing projects to scale up by several orders of magnitude, which is transforming genomics into a highly data-intensive field of research. This development provides the much needed statistical power required for genotype-phenotype predictions in complex diseases. METHODS In order to efficiently leverage the wealth of information, we here assessed several genomic data science tools. The rationale to focus on on-premise installations is to cope with situations where data confidentiality and compliance regulations etc. rule out cloud based solutions. We established a comprehensive qualitative and quantitative comparison between BCFtools, SnpSift, Hail, GEMINI, and OpenCGA. The tools were compared in terms of data storage technology, query speed, scalability, annotation, data manipulation, visualization, data output representation, and availability. RESULTS Tools that leverage sophisticated data structures are noted as the most suitable for large-scale projects in varying degrees of scalability in comparison to flat-file manipulation (e.g., BCFtools, and SnpSift). Remarkably, for small to mid-size projects, even lightweight relational database. CONCLUSION The assessment criteria provide insights into the typical questions posed in scalable genomics and serve as guidance for the development of scalable computational infrastructure in genomics.
Collapse
Affiliation(s)
- Amira Al-Aamri
- Department of Electrical Engineering and Computer Science, College of Engineering, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
| | - Syafiq Kamarul Azman
- Department of Electrical Engineering and Computer Science, College of Engineering, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
| | - Gihan Daw Elbait
- Department of Biology, College of Arts and Sciences, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
- Center for Biotechnology (BTC), Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
| | - Habiba Alsafar
- Center for Biotechnology (BTC), Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
| | - Andreas Henschel
- Department of Electrical Engineering and Computer Science, College of Engineering, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates.
- Center for Biotechnology (BTC), Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
3
|
Rostam Niakan Kalhori S, Deserno TM, Soleiman J, Kasiri Habibabadi S. Data Sharing Platform for MIMIC-IV and MIMIC-ED Data Marts: Designing a Data Retrieving System Based on the Intra-Hospital Patient Transfer Pathway. Stud Health Technol Inform 2023; 302:98-102. [PMID: 37203617 DOI: 10.3233/shti230072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Accessibility to high-quality historical data for patients in hospitals may facilitate related predictive model development and data analysis experiments. This study provides a design for a data-sharing platform based on all possible criteria for Medical Information Mart for Intensive Care (MIMIC) IV and Emergency MIMIC-ED. Tables containing columns of medical attributions and outcomes were studied by a team of 5 experts in Medical Informatics. They completely agreed about the columns connection using subject-id, HDM-id, and stay-id as foreign keys. The tables of two marts were considered in the intra-hospital patient transfer path with various outcomes. Using the constraints, queries were generated and applied to the backend of the platform. The suggested user interface was drawn to retrieve records based on various entry criteria and present the output in the frame of a dashboard or a graph. This design is a step toward platform development that is useful for studies aimed at patient trajectory analysis, medical outcome prediction, or studies that require heterogeneous data entries.
Collapse
Affiliation(s)
- Sharareh Rostam Niakan Kalhori
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Braunschweig, Germany
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Thomas M Deserno
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Braunschweig, Germany
| | - Jamal Soleiman
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Braunschweig, Germany
| | - Shayan Kasiri Habibabadi
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Braunschweig, Germany
| |
Collapse
|
4
|
Hrisafov MK, Ivanov MEAB, Chivarov APN, Chivarov MES. Cost-effective automated multipurpose delivery system software for hospitals. IFAC Pap OnLine 2022; 55:437-442. [PMID: 38620881 PMCID: PMC9764834 DOI: 10.1016/j.ifacol.2022.12.076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Covid-19 pandemic has impacted every aspect of our society. One of the worst affected parts is the countries' health systems. Our goal is to provide a proof of concept for cost effective automated delivery system which can be used by hospitals for distributing medicine and food to patients in non-intensive wards, so medical personnel exposure to the virus can be minimized. Only free and open source software tools are used. Working proof of concept of the system is created consisted of: robot platform running ROS, SQL Server relational database, Web App. Limitations are identified. Testing is successful. We have showed that using free and open-source software and tools, it is possible to achieve the goal of creating the system.
Collapse
Affiliation(s)
- Mag Kocho Hrisafov
- Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | | | | | - Mag Eng Stefan Chivarov
- Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
| |
Collapse
|
5
|
Fraga KJ, Huang YJ, Ramelot TA, Swapna GVT, Lashawn Anak Kendary A, Li E, Korf I, Montelione GT. SpecDB: A relational database for archiving biomolecular NMR spectral data. J Magn Reson 2022; 342:107268. [PMID: 35930941 PMCID: PMC9922030 DOI: 10.1016/j.jmr.2022.107268] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 06/16/2022] [Accepted: 07/06/2022] [Indexed: 05/11/2023]
Abstract
NMR is a valuable experimental tool in the structural biologist's toolkit to elucidate the structures, functions, and motions of biomolecules. The progress of machine learning, particularly in structural biology, reveals the critical importance of large, diverse, and reliable datasets in developing new methods and understanding in structural biology and science more broadly. Biomolecular NMR research groups produce large amounts of data, and there is renewed interest in organizing these data to train new, sophisticated machine learning architectures and to improve biomolecular NMR analysis pipelines. The foundational data type in NMR is the free-induction decay (FID). There are opportunities to build sophisticated machine learning methods to tackle long-standing problems in NMR data processing, resonance assignment, dynamics analysis, and structure determination using NMR FIDs. Our goal in this study is to provide a lightweight, broadly available tool for archiving FID data as it is generated at the spectrometer, and grow a new resource of FID data and associated metadata. This study presents a relational schema for storing and organizing the metadata items that describe an NMR sample and FID data, which we call Spectral Database (SpecDB). SpecDB is implemented in SQLite and includes a Python software library providing a command-line application to create, organize, query, backup, share, and maintain the database. This set of software tools and database schema allow users to store, organize, share, and learn from NMR time domain data. SpecDB is freely available under an open source license at https://github.rpi.edu/RPIBioinformatics/SpecDB.
Collapse
Affiliation(s)
- Keith J Fraga
- Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA.
| | - Yuanpeng J Huang
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| | - Theresa A Ramelot
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| | - G V T Swapna
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA; Department of Pharmacology, Robert Wood Johnson Medical School, Rutgers The State University of New Jersey, Piscataway, NJ 08854, USA.
| | | | - Ethan Li
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| | - Ian Korf
- Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA.
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 USA.
| |
Collapse
|
6
|
Padmavathi P, Chandrashekar K, Setlur AS, Niranjan V. MutaXome: A Novel Database for Identified Somatic Variations of In silico Analyzed Cancer Exome Datasets. Cancer Inform 2022; 21:11769351221097593. [PMID: 35586731 PMCID: PMC9109167 DOI: 10.1177/11769351221097593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 04/09/2022] [Indexed: 11/17/2022] Open
Abstract
Advancements in the field of cancer research have enabled researchers and clinicians to access a massive amount of data to aid cancer patients and to add to the existing knowledge of research. However, despite the existence of reliable sources for extricating this data, it remains a challenge to accurately comprehend and draw conclusions based on the entirety of available information. Therefore, the current study aimed to design and develop a database for the identified variants of 5 different cancer types using 20 different cancer exomes. The exome data were retrieved from NCBI SRA and an NGS data clean-up protocol was implemented to obtain the best quality reads. The reads which passed the quality checks were then used for calling the variants which were then processed and filtered. This data was used to normalize and the normalized data generated was used for developing the database. MutaXome, which stands for mutations in cancer exome was designed in SQL, with the front end in bootstrap and HTML, and backend in PHP. The normalized data containing the variants inclusive of Single Nucleotide Polymorphisms (SNPs), were added into MutaXome, which contains detailed information regarding each type of identified variant. This database, available online via http://www.vidyalab.rf.gd/, serves as a knowledge base for cancer exome variations and holds much potential for enriching it by linking it to a decision support system as prospective studies.
Collapse
Affiliation(s)
- P Padmavathi
- Department of Biotechnology, R V College of Engineering, Bengaluru, Karnataka, India
| | - K Chandrashekar
- Department of Biotechnology, R V College of Engineering, Bengaluru, Karnataka, India
| | - Anagha S Setlur
- Department of Biotechnology, R V College of Engineering, Bengaluru, Karnataka, India
| | - Vidya Niranjan
- Department of Biotechnology, R V College of Engineering, Bengaluru, Karnataka, India
| |
Collapse
|
7
|
Abstract
BACKGROUND The Sequence Alignment/Map Format Specification (SAM) is one of the most widely adopted file formats in bioinformatics and many researchers use it daily. Several tools, including most high-throughput sequencing read aligners, use it as their primary output and many more tools have been developed to process it. However, despite its flexibility, SAM encoded files can often be difficult to query and understand even for experienced bioinformaticians. As genomic data are rapidly growing, structured, and efficient queries on data that are encoded in SAM/BAM files are becoming increasingly important. Existing tools are very limited in their query capabilities or are not efficient. Critically, new tools that address these shortcomings, should not be able to support existing large datasets but should also do so without requiring massive data transformations and file infrastructure reorganizations. RESULTS Here we introduce SamQL, an SQL-like query language for the SAM format with intuitive syntax that supports complex and efficient queries on top of SAM/BAM files and that can replace commonly used Bash one-liners employed by many bioinformaticians. SamQL has high expressive power with no upper limit on query size and when parallelized, outperforms other substantially less expressive software. CONCLUSIONS SamQL is a complete query language that we envision as a step to a structured database engine for genomics. SamQL is written in Go, and is freely available as standalone program and as an open-source library under an MIT license, https://github.com/maragkakislab/samql/ .
Collapse
Affiliation(s)
- Christopher T Lee
- Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD, 21224, USA
| | - Manolis Maragkakis
- Laboratory of Genetics and Genomics, National Institute on Aging, Intramural Research Program, National Institutes of Health, Baltimore, MD, 21224, USA.
| |
Collapse
|
8
|
Abstract
Evaluation of forensic evidence using Bayesian statistics requires the formulation of hypotheses. Many hypotheses, especially those presenting the defence viewpoint imply that traces can be attributed to an arbitrary member of a relevant population. The exact items or persons that comprise the relevant population may vary from case to case. Therefore, the statistical evaluation of evidential value based on databases cannot make use of a fixed set of items or persons. In the current paper, methodology is presented to filter the contents of a database such that only items that are considered relevant are selected. Six scenarios, including those related to fibre, textile, and glass evidence are described, together with the hypotheses and relevant populations that may be evaluated by an expert. In addition, we show how items representing the defined relevant population can be extracted from a database using SQL code. Images of the items in the (filtered) relevant population provide an overview of the selected items and hence direct feedback to the examiner. In this way, erroneous codes or unwanted side effects can be identified and corrected. It is concluded that the filtering procedure is effective in cases where the relevant population is demarcated accurately.
Collapse
Affiliation(s)
- Daisy de Zwart
- Netherlands Forensic Institute, Section of Mictrotraces and Materials, P.O.Box 24044, 2490 AA The Hague, the Netherlands
| | - Jaap van der Weerd
- Netherlands Forensic Institute, Section of Mictrotraces and Materials, P.O.Box 24044, 2490 AA The Hague, the Netherlands.
| |
Collapse
|
9
|
Amiraslani F, Dragovich D. Wildlife and Newspaper Reporting in Iran: A Data Analysis Approach. Animals (Basel) 2021; 11:1487. [PMID: 34063922 DOI: 10.3390/ani11061487] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Revised: 05/19/2021] [Accepted: 05/20/2021] [Indexed: 11/17/2022] Open
Abstract
Simple Summary Three major Iranian daily newspapers were analysed for news items relating to wildlife, covering a 7-year study period. Wildlife items were characterised by public awareness (51%), columnist contributions (46%), and local spatiality (43%). Most items (82%) were allocated space of less than half a page. Results highlighted the minimal number and small space devoted to wildlife news items in Iranian newspapers. Abstract Human response to wildlife management is widespread, encompassing both human–wildlife conflicts and wildlife conservation, in different places and at different times. As people become increasingly aware of the importance of wildlife to biological and environmental sustainability, newspapers can be important sources of information, especially in developing countries, such as Iran. Three major Iranian daily newspapers were analysed for news items related to wildlife. Over the 7-year study period, 434 articles presented environmental news, of which 61 items referred to wildlife. Each wildlife item was recorded in terms of message, contributor, spatiality, and allocated space. Structure query language (SQL) was used to analyse relationships between the 915 fields/entries. Wildlife items were characterised by public awareness (51%), columnist contributions (46%), and local spatiality (43%). Most items (82%) were allocated space of less than half a page. Of the categorised topics, those of combined endangered land (30%) and marine (5%) species exceeded items on global conservation (24%). Results highlighted the minimal number and small space devoted to wildlife news items and their concentrations (67%) in one of the three sampled newspapers. Although nature has historically been important in Iranian culture, current attitudes to wildlife, as reflected in newspaper coverage, do not seem to mirror these traditional perspectives. Given the widespread distribution of newspapers and their roles (i.e., as sources of information and opinion influencers), global wildlife conservation issues would benefit from much greater coverage in the daily press.
Collapse
|
10
|
Grust T. From Blackboard to Green Screen: Delivering a Semester of In-Depth Database Lectures via YouTube. ACTA ACUST UNITED AC 2020;:1-11. [PMID: 33390876 DOI: 10.1007/s13222-020-00362-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 11/26/2020] [Accepted: 12/08/2020] [Indexed: 11/30/2022]
Abstract
We report on the conversion of two advanced database courses from their classical in-lecture-hall setup into an all-digital remote format that was delivered via YouTube. While the course contents were not turned on their heads, throughout the semester we adopted a video style that has been popularized by the live coding community. This new focus on the live interaction with the underlying database systems, led us (1) to adopt the idea of SQL probe queries that are specifically crafted to reveal database internals and (2) a study of database-supported computation that treats SQL like a true programming language. We are happy to share videos, slides, and code with anyone who is interested.
Collapse
|
11
|
Dolezel D, McLeod A. Big-Data Skills: Bridging the Data Science Theory-Practice Gap in Healthcare. Perspect Health Inf Manag 2020; 18:1j. [PMID: 33633520 PMCID: PMC7883353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Demand for big-data scientists continues to escalate driving a pressing need for new graduates to be more fluent in the big-data skills needed by employers. If a gap exists between the educational knowledge held by graduates and big data workplace skills needed to produce results, workers will be unable to address the big data needs of employers. This survey explores big-data skills in the classroom and those required in the workplace to determine if a skills gap exists for big-data scientists. In this work, data was collected using a national survey of healthcare professionals. Participant responses were analyzed to inform curriculum development, providing valuable information for academics and the industry leaders who hire new data talent.
Collapse
Affiliation(s)
- Diane Dolezel
- is assistant profession at the HIM Department of Texas State University in San Marcos
| | - Alexander McLeod
- is an associate professor and department chair at the HIM Department of Texas State University in San Marcos
| |
Collapse
|
12
|
Wiewiórka M, Szmurło A, Kuśmirek W, Gambin T. SeQuiLa-cov: A fast and scalable library for depth of coverage calculations. Gigascience 2019; 8:giz094. [PMID: 31378808 PMCID: PMC6680061 DOI: 10.1093/gigascience/giz094] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 05/24/2019] [Accepted: 07/10/2019] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Depth of coverage calculation is an important and computationally intensive preprocessing step in a variety of next-generation sequencing pipelines, including the analysis of RNA-sequencing data, detection of copy number variants, or quality control procedures. RESULTS Building upon big data technologies, we have developed SeQuiLa-cov, an extension to the recently released SeQuiLa platform, which provides efficient depth of coverage calculations, reaching >100× speedup over the state-of-the-art tools. The performance and scalability of our solution allow for exome and genome-wide calculations running locally or on a cluster while hiding the complexity of the distributed computing with Structured Query Language Application Programming Interface. CONCLUSIONS SeQuiLa-cov provides significant performance gain in depth of coverage calculations streamlining the widely used bioinformatic processing pipelines.
Collapse
Affiliation(s)
- Marek Wiewiórka
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
| | - Agnieszka Szmurło
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
| | - Wiktor Kuśmirek
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
| | - Tomasz Gambin
- Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland
| |
Collapse
|
13
|
Howard JC, Florentinus-Mefailoski A, Bowden P, Trimble W, Grinstein S, Marshall JG. OxLDL receptor chromatography from live human U937 cells identifies SYK(L) that regulates phagocytosis of oxLDL. Anal Biochem 2016; 513:7-20. [PMID: 27510553 DOI: 10.1016/j.ab.2016.07.021] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2015] [Revised: 06/21/2016] [Accepted: 07/19/2016] [Indexed: 11/16/2022]
Abstract
The binding and activation of macrophages by microscopic aggregates of oxLDL in the intima of the arteries may be an important step towards atherosclerosis leading to heart attack and stroke. Microbeads coated with oxLDL were used to activate, capture and isolate the oxLDL receptor complex from the surface of live cells. Analysis of the resulting tryptic peptides by liquid chromatography and tandem mass spectrometry revealed the Spleen Tyrosine Kinase (SYK), and many of SYK's known interaction network including Fc receptors (FCGR2A, FCER1G and FCGR1A) Toll receptor 4 (TLR4), receptor kinases like EGFRs, as well as RNA binding and metabolism proteins. High-intensity precursor ions (∼9*E3 to 2*E5 counts) were correlated to peptides and specific phosphopeptides from long isoform of SYK (SYK-L) by the SEQUEST, OMSSA and X!TANDEM algorithms. Peptides or phosphopeptides from SYK were observed with the oxLDL-microbeads. Pharmacological inhibitors of SYK activity significantly reduced the engulfment of oxLDL microbeads in the presence of serum factors, but had little effect on IgG phagocytosis. Anti SYK siRNA regulated oxLD engulfment in the context of serum factors and or SYK-L siRNA significantly inhibited engulfment of oxLDL microbeads, but not IgG microbeads.
Collapse
Affiliation(s)
- Jeffrey C Howard
- Department of Chemistry and Biology, Ryerson University, Toronto, ON M5B 2K3, Canada
| | | | - Peter Bowden
- Department of Chemistry and Biology, Ryerson University, Toronto, ON M5B 2K3, Canada
| | - William Trimble
- Program in Cell Biology, Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
| | - Sergio Grinstein
- Program in Cell Biology, Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
| | - John G Marshall
- Department of Chemistry and Biology, Ryerson University, Toronto, ON M5B 2K3, Canada.
| |
Collapse
|
14
|
Liu Y, Hao S, Song J, Zhou L, Liu J, Wang Q, Yuan D, Xu D. [Development of a method for cleaning outpatient data rapidly and generating statistical reports automatically to the analysis of time series on the air pollution and disease]. Wei Sheng Yan Jiu 2016; 45:624-630. [PMID: 29903334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
OBJECTIVE To develop a method for cleaning the outpatient data rapidly and generating the statistical reports automatically. METHODS Formulating the cleaning rules according to the data characters, writing programs to clean the individual cases and generate the statistical reports using the SQL language. RESULTS It could clean the different individual cases rapidly, calculate the daily outpatient visits and generate the statistical reports automatically with high accuracy. CONCLUSION The method can apply to the data processing of the hospital cases, first aid cases, cause-of-death data and health records. It not only can process large amounts of data flexibly, conveniently and quickly, but also has great practical value. So it is the necessary way to the health risk assessment of air pollution.
Collapse
Affiliation(s)
- Yue Liu
- Institute for Environmental Health and Related Product Safety, Chinese Center for Disease Control and Prevention, Beijing 100021, China
| | - Shuxin Hao
- Institute for Environmental Health and Related Product Safety, Chinese Center for Disease Control and Prevention, Beijing 100021, China
| | - Jie Song
- Institute for Environmental Health and Related Product Safety, Chinese Center for Disease Control and Prevention, Beijing 100021, China
| | - Lian Zhou
- Institute for Environmental Health and Related Product Safety, Chinese Center for Disease Control and Prevention, Beijing 100021, China
| | - Jie Liu
- Institute for Environmental Health and Related Product Safety, Chinese Center for Disease Control and Prevention, Beijing 100021, China
| | - Qiushui Wang
- Institute for Environmental Health and Related Product Safety, Chinese Center for Disease Control and Prevention, Beijing 100021, China
| | - Dayong Yuan
- Institute for Environmental Health and Related Product Safety, Chinese Center for Disease Control and Prevention, Beijing 100021, China
| | - Dongqun Xu
- Institute for Environmental Health and Related Product Safety, Chinese Center for Disease Control and Prevention, Beijing 100021, China
| |
Collapse
|
15
|
Dixit A, Dobson RJB. CohortExplorer: A Generic Application Programming Interface for Entity Attribute Value Database Schemas. JMIR Med Inform 2014; 2:e32. [PMID: 25601296 PMCID: PMC4288104 DOI: 10.2196/medinform.3339] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Revised: 08/03/2014] [Accepted: 09/19/2014] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Most electronic data capture (EDC) and electronic data management (EDM) systems developed to collect and store clinical data from participants recruited into studies are based on generic entity-attribute-value (EAV) database schemas which enable rapid and flexible deployment in a range of study designs. The drawback to such schemas is that they are cumbersome to query with structured query language (SQL). The problem increases when researchers involved in multiple studies use multiple electronic data capture and management systems each with variation on the EAV schema. OBJECTIVE The aim of this study is to develop a generic application which allows easy and rapid exploration of data and metadata stored under EAV schemas that are organized into a survey format (questionnaires/events, questions, values), in other words, the Clinical Data Interchange Standards Consortium (CDISC) Observational Data Model (ODM). METHODS CohortExplorer is written in Perl programming language and uses the concept of SQL abstract which allows the SQL query to be treated like a hash (key-value pairs). RESULTS We have developed a tool, CohortExplorer, which once configured for a EAV system will "plug-n-play" with EAV schemas, enabling the easy construction of complex queries through an abstracted interface. To demonstrate the utility of the CohortExplorer system, we show how it can be used with the popular EAV based frameworks; Opal (OBiBa) and REDCap. CONCLUSIONS The application is available under a GPL-3+ license at the CPAN website. Currently the application only provides datasource application programming interfaces (APIs) for Opal and REDCap. In the future the application will be available with datasource APIs for all major electronic data capture and management systems such as OpenClinica and LabKey. At present the application is only compatible with EAV systems where the metadata is organized into surveys, questionnaires and events. Further work is needed to make the application compatible with EAV schemas where the metadata is organized into hierarchies such as Informatics for Integrating Biology & the Bedside (i2b2). A video tutorial demonstrating the application setup, datasource configuration, and search features is available on YouTube. The application source code is available at the GitHub website and the users are encouraged to suggest new features and contribute to the development of APIs for new EAV systems.
Collapse
Affiliation(s)
- Abhishek Dixit
- Institute of Psychiatry, NIHR Biomedical Research Centre for Mental Health & Biomedical Research Unit for Dementia, South London and Maudsley NHS Foundation Trust & Institute of Psychiatry, Kings College London, London, United Kingdom.
| | | |
Collapse
|
16
|
Rasmussen MK, Ekstrand B. Regulation of 3β-hydroxysteroid dehydrogenase and sulphotransferase 2A1 gene expression in primary porcine hepatocytes by selected sex-steroids and plant secondary metabolites from chicory (Cichorium intybus L.) and wormwood (Artemisia sp.). Gene 2014; 536:53-8. [PMID: 24333270 DOI: 10.1016/j.gene.2013.11.092] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 11/30/2013] [Indexed: 12/15/2022]
Abstract
In pigs the endogenously produced compound androstenone is metabolised in the liver in two steps by 3β-hydroxysteroid dehydrogenase (3β-HSD) and sulphotransferase 2A1 (SULT2A1). The present study investigated the effect of selected sex-steroids (0.01-1 μM androstenone, testosterone and estradiol), skatole (1-100 μM) and secondary plant metabolites (1-100 μM) on the expression of 3β-HSD and SULT2A1 mRNA. Additionally the effect of a global methanolic extract of dried chicory root was investigated and compared to previous obtained in vivo effects. Primary hepatocytes were isolated from the livers of piglets (crossbreed: Landrace×Yorkshire and Duroc) and cultured for 24h before treatment for an additionally 24h. RNA was isolated from the hepatocytes and specific gene expression determined by RT-PCR using TaqMan probes. The investigated sex-steroids had no effect on the mRNA expression of 3β-HSD and SULT2A1, while skatole decreased the content of SULT2A1 30% compared to control. Of the investigated secondary plant metabolites artemisinin and scoparone (found in Artemisia sp.) lowered the content of SULT2A1 by 20 and 30% compared to control, respectively. Moreover, we tested three secondary plant metabolites (lactucin, esculetin and esculin) found in chicory root. Lactucin increased the mRNA content of both 3β-HSD and SULT2A1 by 200% compared to control. An extract of chicory root was shown to decrease the expression of both 3β-HSD and SULT2A1. It is concluded that the gene expression of enzymes with importance for androstenone metabolism is regulated by secondary plant metabolites in a complex manner.
Collapse
Affiliation(s)
| | - Bo Ekstrand
- Department of Food Science, Aarhus University, Denmark
| |
Collapse
|
17
|
Knudsen AD, Bennike T, Kjeldal H, Birkelund S, Otzen DE, Stensballe A. Condenser: a statistical aggregation tool for multi-sample quantitative proteomic data from Matrix Science Mascot Distiller™. J Proteomics 2014; 103:261-6. [PMID: 24530376 DOI: 10.1016/j.jprot.2014.02.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2013] [Revised: 01/29/2014] [Accepted: 02/02/2014] [Indexed: 01/07/2023]
Abstract
We describe Condenser, a freely available, comprehensive open-source tool for merging multidimensional quantitative proteomics data from the Matrix Science Mascot Distiller Quantitation Toolbox into a common format ready for subsequent bioinformatic analysis. A number of different relative quantitation technologies, such as metabolic (15)N and amino acid stable isotope incorporation, label-free and chemical-label quantitation are supported. The program features multiple options for curative filtering of the quantified peptides, allowing the user to choose data quality thresholds appropriate for the current dataset, and ensure the quality of the calculated relative protein abundances. Condenser also features optional global normalization, peptide outlier removal, multiple testing and calculation of t-test statistics for highlighting and evaluating proteins with significantly altered relative protein abundances. Condenser provides an attractive addition to the gold-standard quantitative workflow of Mascot Distiller, allowing easy handling of larger multi-dimensional experiments. Source code, binaries, test data set and documentation are available at http://condenser.googlecode.com/.
Collapse
|
18
|
Zhu P, Bowden P, Zhang D, Marshall JG. Mass spectrometry of peptides and proteins from human blood. Mass Spectrom Rev 2011; 30:685-732. [PMID: 24737629 DOI: 10.1002/mas.20291] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2008] [Revised: 12/09/2009] [Accepted: 01/19/2010] [Indexed: 06/03/2023]
Abstract
It is difficult to convey the accelerating rate and growing importance of mass spectrometry applications to human blood proteins and peptides. Mass spectrometry can rapidly detect and identify the ionizable peptides from the proteins in a simple mixture and reveal many of their post-translational modifications. However, blood is a complex mixture that may contain many proteins first expressed in cells and tissues. The complete analysis of blood proteins is a daunting task that will rely on a wide range of disciplines from physics, chemistry, biochemistry, genetics, electromagnetic instrumentation, mathematics and computation. Therefore the comprehensive discovery and analysis of blood proteins will rank among the great technical challenges and require the cumulative sum of many of mankind's scientific achievements together. A variety of methods have been used to fractionate, analyze and identify proteins from blood, each yielding a small piece of the whole and throwing the great size of the task into sharp relief. The approaches attempted to date clearly indicate that enumerating the proteins and peptides of blood can be accomplished. There is no doubt that the mass spectrometry of blood will be crucial to the discovery and analysis of proteins, enzyme activities, and post-translational processes that underlay the mechanisms of disease. At present both discovery and quantification of proteins from blood are commonly reaching sensitivities of ∼1 ng/mL.
Collapse
Affiliation(s)
- Peihong Zhu
- Department of Chemistry and Biology, Ryerson University, 350 Victoria Street, Toronto, Ontario, Canada M5B 2K3
| | | | | | | |
Collapse
|
19
|
Abstract
Progress in experimental tools and design is allowing the acquisition of increasingly large datasets. Storage, manipulation and efficient analyses of such large amounts of data is now a primary issue. We present OpenElectrophy, an electrophysiological data- and analysis-sharing framework developed to fill this niche. It stores all experiment data and meta-data in a single central MySQL database, and provides a graphic user interface to visualize and explore the data, and a library of functions for user analysis scripting in Python. It implements multiple spike-sorting methods, and oscillation detection based on the ridge extraction methods due to Roux et al. (2007). OpenElectrophy is open source and is freely available for download at http://neuralensemble.org/trac/OpenElectrophy.
Collapse
Affiliation(s)
- Samuel Garcia
- Neurosciences Sensorielles Comportement Cognition, CNRS - UMR5020 - Université Claude Bernard Lyon 1 Lyon, France
| | | |
Collapse
|