Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

50
(from Reference Citation Analysis)

Article PDFs (15)

Cited by > 0 (38)

Searched Name

Exploratory data analysis

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Zoehler B, de Aguiar AM, Silveira GF. SAEDC: Development of a technological solution for exploratory data analysis and statistics in cytotoxicity. Comput Struct Biotechnol J 2024;23:483-490. [PMID: 38261941 PMCID: PMC10796974 DOI: 10.1016/j.csbj.2023.12.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 12/14/2023] [Accepted: 12/15/2023] [Indexed: 01/25/2024] Open

Abstract

INTRODUCTION

The intergovernmental organizations Organisation for Economic Co-operation and Development (OECD) and Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) have developed guidelines for the use of in vitro models for toxicological evaluation of chemicals. However, the presence of manual steps and the requirement of multiple tools for data analysis, apart from being costly and time-consuming, can inadvertently introduce errors by researchers.

OBJECTIVES

We have developed the SAEDC platform (Technological Solution for Exploratory Data Analysis and Statistics for Cytotoxicity, in Portuguese), which enables analysis of cytotoxicity data from assays following OECD Guideline No. 129.

METHODOLOGY

In vitro experimental data were used to compare with the analysis methodology suggested in the Guideline. We analyzed 117 data sets covering chemicals from Category I to Unclassified according to GHS classification.

RESULTS

The four-parameters of non-linear regression (4PL) calculated by the SAEDC platform showed no significant differences compared to standard methodology in any of the data sets (p > 0.05). The coefficient of determination (R-squared) also demonstrated not only a good fit of the 4PL model to the data but also significant similarity to values obtained by the conventional methodology. Finally, the SAEDC platform predicted LD50 values for the chemicals from IC50, using the Registry of Cytotoxicity (RC) regression models.

CONCLUSION

The comparison with the standard data analysis methodology revealed that SAEDC platform fulfills the requirements for cytotoxicity data analysis, generating reliable and accurate results with fewer steps performed by researchers. The use of SAEDC platform for obtaining toxicity values can reduce analysis time compared to the standard methodology proposed by regulatory agencies. Thus, automation of the analysis using the SAEDC platform has the potential to save time and resources for cytotoxicity researchers and laboratories while generating reliable results.

Collapse

Ordóñez Á, Sánchez E, Carlos Solano J, Parra-Domínguez J. Demand charges reduction with photovoltaics in industry. Heliyon 2024;10:e23404. [PMID: 38169926 PMCID: PMC10758794 DOI: 10.1016/j.heliyon.2023.e23404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 11/13/2023] [Accepted: 12/04/2023] [Indexed: 01/05/2024] Open

Fan YV, Čuček L, Si C, Jiang P, Vujanović A, Krajnc D, Lee CT. Uncovering environmental performance patterns of plastic packaging waste in high recovery rate countries: An example of EU-27. Environ Res 2024;241:117581. [PMID: 37967705 DOI: 10.1016/j.envres.2023.117581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/17/2023]

Abstract

Plastic consumption and its end-of-life management pose a significant environmental footprint and are energy intensive. Waste-to-resources and prevention strategies have been promoted widely in Europe as countermeasures; however, their effectiveness remains uncertain. This study aims to uncover the environmental footprint patterns of the plastics value chain in the European Union Member States (EU-27) through exploratory data analysis with dimension reduction and grouping. Nine variables are assessed, ranging from socioeconomic and demographic to environmental impacts. Three clusters are formed according to the similarity of a range of characteristics (nine), with environmental impacts being identified as the primary influencing variable in determining the clusters. Most countries belong to Cluster 0, consisting of 17 countries in 2014 and 18 countries in 2019. They represent clusters with a relatively low global warming potential (GWP), with an average value of 2.64 t CO2eq/cap in 2014 and 4.01 t CO2eq/cap in 2019. Among all the assessed countries, Denmark showed a significant change when assessed within the traits of EU-27, categorised from Cluster 1 (high GWP) in 2014 to Cluster 0 (low GWP) in 2019. The analysis of plastic packaging waste statistics in 2019 (data released in 2022) shows that, despite an increase in the recovery rate within the EU-27, the GWP has not reduced, suggesting a rebound effect. The GWP tends to increase in correlation with the higher plastic waste amount. In contrast, other environmental impacts, like eutrophication, abiotic and acidification potential, are identified to be mitigated effectively via recovery, suppressing the adverse effects of an increase in plastic waste generation. The five-year interval data analysis identified distinct clusters within a set of patterns, categorising them based on their similarities. The categorisation and managerial insights serve as a foundation for devising a focused mitigation strategy.

Collapse

Gonzalez-Ponce K, Horta Andrade C, Hunter F, Kirchmair J, Martinez-Mayorga K, Medina-Franco JL, Rarey M, Tropsha A, Varnek A, Zdrazil B. School of cheminformatics in Latin America. J Cheminform 2023;15:82. [PMID: 37726809 PMCID: PMC10507835 DOI: 10.1186/s13321-023-00758-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 09/10/2023] [Indexed: 09/21/2023] Open

Hernandez-Betancur JD, Ruiz-Mercado GJ, Martin M. Tracking end-of-life stage of chemicals: A scalable data-centric and chemical-centric approach. Resour Conserv Recycl 2023;196:1-13. [PMID: 37476199 PMCID: PMC10355112 DOI: 10.1016/j.resconrec.2023.107031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/22/2023]

Abstract

Chemical flow analysis (CFA) can be used for collecting life-cycle inventory (LCI), estimating environmental releases, and identifying potential exposure scenarios for chemicals of concern at the end-of-life (EoL) stage. Nonetheless, the demand for comprehensive data and the epistemic uncertainties about the pathway taken by the chemical flows make CFA, LCI, and exposure assessment time-consuming and challenging tasks. Due to the continuous growth of computer power and the appearance of more robust algorithms, data-driven modelling represents an attractive tool for streamlining these tasks. However, a data ingestion pipeline is required for the deployment of serving data-driven models in the real world. Hence, this work moves forward by contributing a chemical-centric and data-centric approach to extract, transform, and load comprehensive data for CFA at the EoL, integrating cross-year and country data and its provenance as part of the data lifecycle. The framework is scalable and adaptable to production-level machine learning operations. The framework can supply data at an annual rate, making it possible to deal with changes in the statistical distributions of model predictors like transferred amount and target variables (e.g., EoL activity identification) to avoid potential data-driven model performance decay over time. For instance, it can detect that recycling transfers of 643 chemicals over the reporting years (1988 to 2020) are 29.87%, 17.79%, and 20.56% for Canada, Australia, and the U.S. Finally, the developed approach enables research advancements on data-driven modelling to easily connect with other data sources for economic information on industry sectors, the economic value of chemicals, and the environmental regulatory implications that may affect the occurrence of an EoL transfer class or activity like recycling of a chemical over years and countries. Finally, stakeholders gain more context about environmental regulation stringency and economic affairs that could affect environmental decision-making and EoL chemical exposure predictions.

Collapse

Tseng YJ, Chen CJ, Chang CW. lab: an R package for generating analysis-ready data from laboratory records. PeerJ Comput Sci 2023;9:e1528. [PMID: 37705643 PMCID: PMC10495959 DOI: 10.7717/peerj-cs.1528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 07/20/2023] [Indexed: 09/15/2023]

Abstract

Background

Electronic health records (EHRs) play a crucial role in healthcare decision-making by giving physicians insights into disease progression and suitable treatment options. Within EHRs, laboratory test results are frequently utilized for predicting disease progression. However, processing laboratory test results often poses challenges due to variations in units and formats. In addition, leveraging the temporal information in EHRs can improve outcomes, prognoses, and diagnosis predication. Nevertheless, the irregular frequency of the data in these records necessitates data preprocessing, which can add complexity to time-series analyses.

Methods

To address these challenges, we developed an open-source R package that facilitates the extraction of temporal information from laboratory records. The proposed lab package generates analysis-ready time series data by segmenting the data into time-series windows and imputing missing values. Moreover, users can map local laboratory codes to the Logical Observation Identifier Names and Codes (LOINC), an international standard. This mapping allows users to incorporate additional information, such as reference ranges and related diseases. Moreover, the reference ranges provided by LOINC enable us to categorize results into normal or abnormal. Finally, the analysis-ready time series data can be further summarized using descriptive statistics and utilized to develop models using machine learning technologies.

Results

Using the lab package, we analyzed data from MIMIC-III, focusing on newborns with patent ductus arteriosus (PDA). We extracted time-series laboratory records and compared the differences in test results between patients with and without 30-day in-hospital mortality. We then identified significant variations in several laboratory test results 7 days after PDA diagnosis. Leveraging the time series-analysis-ready data, we trained a prediction model with the long short-term memory algorithm, achieving an area under the receiver operating characteristic curve of 0.83 for predicting 30-day in-hospital mortality in model training. These findings demonstrate the lab package's effectiveness in analyzing disease progression.

Conclusions

The proposed lab package simplifies and expedites the workflow involved in laboratory records extraction. This tool is particularly valuable in assisting clinical data analysts in overcoming the obstacles associated with heterogeneous and sparse laboratory records.

Collapse

Koteeswaran S, Suganya R, Surianarayanan C, Neeba EA, Suresh A, Chelliah PR, Buhari SM. A supervised learning approach for the influence of comorbidities in the analysis of COVID-19 mortality in Tamil Nadu. Soft comput 2023:1-15. [PMID: 37362286 PMCID: PMC10238245 DOI: 10.1007/s00500-023-08590-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/19/2023] [Indexed: 06/28/2023]

Rahnenführer J, De Bin R, Benner A, Ambrogi F, Lusa L, Boulesteix AL, Migliavacca E, Binder H, Michiels S, Sauerbrei W, McShane L. Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges. BMC Med 2023;21:182. [PMID: 37189125 DOI: 10.1186/s12916-023-02858-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 04/03/2023] [Indexed: 05/17/2023] Open

Abstract

BACKGROUND

In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions.

METHODS

Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 "High-dimensional data" of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD.

RESULTS

The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided.

CONCLUSIONS

This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.

Collapse

Pakkan S, Sudhakar C, Tripathi S, Rao M. A correlation study of sustainable development goal (SDG) interactions. Qual Quant 2023;57:1937-1956. [PMID: 35729959 PMCID: PMC9189271 DOI: 10.1007/s11135-022-01443-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Accepted: 05/20/2022] [Indexed: 11/05/2022]

Zoiros A, Vrahatis A. Effective Preprocessing of Single-Cell RNA-Seq for Unravelling Alzheimer's Disease Signatures. Adv Exp Med Biol 2023;1423:251-256. [PMID: 37525052 DOI: 10.1007/978-3-031-31978-5_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2023]

Kumar K, Pande BP. Air pollution prediction with machine learning: a case study of Indian cities. Int J Environ Sci Technol (Tehran) 2022;20:5333-5348. [PMID: 35603096 PMCID: PMC9107909 DOI: 10.1007/s13762-022-04241-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 02/17/2022] [Accepted: 04/19/2022] [Indexed: 05/06/2023]

Borsboom D. Possible Futures for Network Psychometrics. Psychometrika 2022;87:253-265. [PMID: 35334037 PMCID: PMC9021084 DOI: 10.1007/s11336-022-09851-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/02/2022] [Indexed: 06/14/2023]

Jagadev P, Naik S, Giri LI. Contactless monitoring of human respiration using infrared thermography and deep learning. Physiol Meas 2022;43. [PMID: 35193123 DOI: 10.1088/1361-6579/ac57a8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 02/22/2022] [Indexed: 11/11/2022]

Abstract

OBJECTIVE

To monitor the human respiration rate (RR) using infrared thermography (IRT) and artificial intelligence, in a completely non-invasive and automated manner.

APPROACH

The human breathing signals (BS) were obtained using IRT. The RR was monitored under extreme conditions, by developing a deep learning (DL) based "Residual network 50+Facial landmark detection" (ResNet 50+FLD) model. This model was built and evaluated on 10,000 thermograms and is the first work that documents the use of a DL classifier on a large thermal dataset for nostril tracking. Further, the acquired BS were filtered using the Moving average filter (MAF), and the Butterworth filter (BF). The novel "Breathing signal characterization algorithm (BSCA)" was proposed to obtain the RR in an automated manner. This algorithm is the first work that identifies the breaths in the thermal BS as regular, prolonged, or rapid, using machine learning (ML). The "Exploratory data analysis" was performed to choose an appropriate ML algorithm for the BSCA. The performance of the "BSCA" was evaluated for both "Decision tree (DT)" and "Support vector machine(SVM)" models.

MAIN RESULTS

The "ResNet 50+FLD model" had Validation and Testing accuracy, of 99.5 %, and 99.4 % respectively. The Precision, Sensitivity, Specificity, F-measure, and G- mean values were computed as well. The comparative analysis of the filters revealed that the BF performed better than the MAF. The "BSCA" performed better with the SVM classifier, than the DT classifier, with Validation accuracy, and Testing accuracy of 99.5%, and 98.83%, respectively.

SIGNIFICANCE

The ever-increasing number of critical cases and the limited availability of skilled medical attendants, advocates in favor of an automated and harmless health monitoring system. The proposed methodology eliminates the risk of infections that spread through contact. It can be used in darkness, and in remote areas as well, where there is a lack of medical attendants.

Collapse

Dodge S, Toka M, Bae CJ. DynamoVis 1.0: an exploratory data visualization software for mapping movement in relation to internal and external factors. Mov Ecol 2021;9:55. [PMID: 34736518 PMCID: PMC8567714 DOI: 10.1186/s40462-021-00291-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 10/21/2021] [Indexed: 05/08/2023]

Cain CN, Sudol PE, Berrier KL, Synovec RE. Development of variance rank initiated-unsupervised sample indexing for gas chromatography-mass spectrometry analysis. Talanta 2021;233:122495. [PMID: 34215113 DOI: 10.1016/j.talanta.2021.122495] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 04/29/2021] [Accepted: 04/30/2021] [Indexed: 02/08/2023]

Abstract

Traditional non-targeted chemometric workflows for gas chromatography-mass spectrometry (GC-MS) data rely on using supervised methods, which requires a priori knowledge of sample class membership. Herein, we propose a simple, unsupervised chemometric workflow known as variance rank initiated-unsupervised sample indexing (VRI-USI). VRI-USI discovers analyte peaks exhibiting high relative variance across all samples, followed by k-means clustering on the individual peaks. Based upon how the samples cluster for a given peak, a sample index assignment is provided. Using a probabilistic argument, if the same sample index assignment appears for several discovered peaks, then this outcome strongly suggests that the samples are properly classified by that particular sample index assignment. Thus, relevant chemical differences between the samples have been discovered in an unsupervised fashion. The VRI-USI workflow is demonstrated on three, increasingly difficult datasets: simulations, yeast metabolomics, and human cancer metabolomics. For simulated GC-MS datasets, VRI-USI discovered 85-90% of analytes modeled to vary between sample classes. Nineteen out of 53 peaks in the peak table developed for the yeast metabolome dataset had the same sample index assignments, indicating that those indices are most likely due to class-distinguishing chemical differences. A t-test revealed that 22 out of 53 peaks were statistically significant (p < 0.05) when using those sample index assignments. Likewise, for the human cancer metabolomics study, VRI-USI discovered 25 analytes that were statistically different (p < 0.05) using the sample index assignments determined to highlight meaningful sample-based differences. For all datasets, the sample index assignments that were deduced from VRI-USI were the correct class-based difference when using prior knowledge. VRI-USI holds promise as an exploratory data analysis workflow for studies in which analysts do not readily have a priori class information or want to uncover the underlying nature of their dataset.

Collapse

Rosa LK, Costa FS, Hauagge CM, Mobile RZ, de Lima AAS, Amaral CDB, Machado RC, Nogueira ARA, Brancher JA, de Araujo MR. Oral health, organic and inorganic saliva composition of men with Schizophrenia: Case-control study. J Trace Elem Med Biol 2021;66:126743. [PMID: 33740480 DOI: 10.1016/j.jtemb.2021.126743] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/03/2021] [Accepted: 03/09/2021] [Indexed: 11/30/2022]

Wentzell PD, Gonçalves TR, Matsushita M, Valderrama P. Combinatorial projection pursuit analysis for exploring multivariate chemical data. Anal Chim Acta 2021;1174:338716. [PMID: 34247741 DOI: 10.1016/j.aca.2021.338716] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 05/26/2021] [Accepted: 05/30/2021] [Indexed: 11/19/2022]

Abstract

Kurtosis-based projection pursuit analysis (kPPA) has demonstrated the ability to visualize multivariate data in a way that complements other exploratory data analysis tools, such as principal components analysis (PCA). It is especially useful for partitioning binary data sets (2^k classes) with a balanced design. Since kPPA is not a variance-based method, it can often provide unsupervised class separation where other methods fail. However, when multiple classifications are possible (e.g. by gender, age, disease state, etc.), the projection provided by kPPA (corresponding to the global minimum kurtosis) will not necessarily be the one of greatest interest to the researcher. Fortunately, the optimization algorithm for kPPA allows for interrogation of projections obtained from numerous local minima. This strategy provides the basis of a new method described here, referred to as combinatorial projection pursuit analysis (CombPPA) because it presents alternative combinations of class separation. The method is truly exploratory in that it allows the landscape of interesting projections to be more fully probed. The approach uses Procrustes rotation to map local minima among the kPPA solutions, whereupon the researcher can visualize different projections. To demonstrate the new method, the clustering of grape juice samples using visible spectroscopy is presented as a model problem. This problem is well-suited to this type of study because there are eight classes of samples symmetrically partitioned into two classes by type (organic/non-organic) or four classes by brand. Results presented show the different combinations of projections that can be obtained, including the desired partitions. In addition, this work describes new enhancements to the kPPA algorithm that improve the orthogonality of solutions obtained.

Collapse

Fife DA, Longo G, Correll M, Tremoulet PD. A graph for every analysis: Mapping visuals onto common analyses using flexplot. Behav Res Methods 2021;53:1876-94. [PMID: 33634423 DOI: 10.3758/s13428-020-01520-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/05/2020] [Indexed: 11/08/2022]

Ratnovsky A, Rozenes S, Bloch E, Halpern P. Statistical learning methodologies and admission prediction in an emergency department. Australas Emerg Care 2021;24:241-247. [PMID: 33461906 DOI: 10.1016/j.auec.2020.11.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2020] [Revised: 10/07/2020] [Accepted: 11/25/2020] [Indexed: 11/30/2022]

Risso D. Normalization of Single-Cell RNA-Seq Data. Methods Mol Biol 2021;2284:303-329. [PMID: 33835450 DOI: 10.1007/978-1-0716-1307-8_17] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Tseng YJ, Chiu HJ, Chen CJ. dxpr: an R package for generating analysis-ready data from electronic health records-diagnoses and procedures. PeerJ Comput Sci 2021;7:e520. [PMID: 34141876 PMCID: PMC8176530 DOI: 10.7717/peerj-cs.520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 04/09/2021] [Indexed: 05/08/2023]

Abstract

BACKGROUND

Enriched electronic health records (EHRs) contain crucial information related to disease progression, and this information can help with decision-making in the health care field. Data analytics in health care is deemed as one of the essential processes that help accelerate the progress of clinical research. However, processing and analyzing EHR data are common bottlenecks in health care data analytics.

METHODS

The dxpr R package provides mechanisms for integration, wrangling, and visualization of clinical data, including diagnosis and procedure records. First, the dxpr package helps users transform International Classification of Diseases (ICD) codes to a uniform format. After code format transformation, the dxpr package supports four strategies for grouping clinical diagnostic data. For clinical procedure data, two grouping methods can be chosen. After EHRs are integrated, users can employ a set of flexible built-in querying functions for dividing data into case and control groups by using specified criteria and splitting the data into before and after an event based on the record date. Subsequently, the structure of integrated long data can be converted into wide, analysis-ready data that are suitable for statistical analysis and visualization.

RESULTS

We conducted comorbidity data processes based on a cohort of newborns from Medical Information Mart for Intensive Care-III (n = 7,833) by using the dxpr package. We first defined patent ductus arteriosus (PDA) cases as patients who had at least one PDA diagnosis (ICD, Ninth Revision, Clinical Modification [ICD-9-CM] 7470*). Controls were defined as patients who never had PDA diagnosis. In total, 381 and 7,452 patients with and without PDA, respectively, were included in our study population. Then, we grouped the diagnoses into defined comorbidities. Finally, we observed a statistically significant difference in 8 of the 16 comorbidities among patients with and without PDA, including fluid and electrolyte disorders, valvular disease, and others.

CONCLUSIONS

This dxpr package helps clinical data analysts address the common bottleneck caused by clinical data characteristics such as heterogeneity and sparseness.

Collapse

Adeniyi MO, Ekum MI, C I, S OA, A AJ, Oke SI, B MM. Dynamic model of COVID-19 disease with exploratory data analysis. Sci Afr 2020;9:e00477. [PMID: 33521409 DOI: 10.1016/j.sciaf.2020.e00477] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 06/17/2020] [Accepted: 07/09/2020] [Indexed: 11/22/2022] Open

Yousif A, Drou N, Rowe J, Khalfan M, Gunsalus KC. NASQAR: a web-based platform for high-throughput sequencing data analysis and visualization. BMC Bioinformatics 2020;21:267. [PMID: 32600310 DOI: 10.1186/s12859-020-03577-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 06/01/2020] [Indexed: 01/23/2023] Open

Giguere DJ, Macklaim JM, Lieng BY, Gloor GB. omicplotR: visualizing omic datasets as compositions. BMC Bioinformatics 2019;20:580. [PMID: 31729955 PMCID: PMC6858670 DOI: 10.1186/s12859-019-3174-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 10/24/2019] [Indexed: 12/02/2022] Open

Álvarez Sánchez R, Beristain Iraola A, Epelde Unanue G, Carlin P. TAQIH, a tool for tabular data quality assessment and improvement in the context of health data. Comput Methods Programs Biomed 2019;181:104824. [PMID: 30638900 DOI: 10.1016/j.cmpb.2018.12.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 09/14/2018] [Accepted: 12/28/2018] [Indexed: 06/09/2023]

Abstract

BACKGROUND AND OBJECTIVES

Data curation is a tedious task but of paramount relevance for data analytics and more specially in the health context where data-driven decisions must be extremely accurate. The ambition of TAQIH is to support non-technical users on 1) the exploratory data analysis (EDA) process of tabular health data, and 2) the assessment and improvement of its quality.

METHODS

A web-based tool has been implemented with a simple yet powerful visual interface. First, it provides interfaces to understand the dataset, to gain the understanding of the content, structure and distribution. Then, it provides data visualization and improvement utilities for the data quality dimensions of completeness, accuracy, redundancy and readability.

RESULTS

It has been applied in two different scenarios. (1) The Northern Ireland General Practitioners (GPs) Prescription Data, an open data set containing drug prescriptions. (2) A glucose monitoring tele health system dataset. Findings on (1) include: Features that had significant amount of missing values (e.g. AMP_NM variable 53.39%); instances that have high percentage of variable values missing (e.g. 0.21% of the instances with > 75% of missing values); highly correlated variables (e.g. Gross and Actual cost almost completely correlated (∼ + 1.0)). Findings on (2) include: Features that had significant amount of missing values (e.g. patient height, weight and body mass index (BMI) (> 70%), date of diagnosis 13%)); highly correlated variables (e.g. height, weight and BMI). Full detail of the testing and insights related to findings are reported.

CONCLUSIONS

TAQIH enables and supports users to carry out EDA on tabular health data and to assess and improve its quality. Having the layout of the application menu arranged sequentially as the conventional EDA pipeline helps following a consistent analysis process. The general description of the dataset and features section is very useful for the first overview of the dataset. The missing value heatmap is also very helpful in visually identifying correlations among missing values. The correlations section has proved to be supportive as a preliminary step before further data analysis pipelines, as well as the outliers section. Finally, the data quality section provides a quantitative value to the dataset improvements.

Collapse

Kluxen FM, Grégoire S, Schepky A, Hewitt NJ, Klaric M, Domoradzki JY, Felkers E, Fernandes J, Fisher P, McEuen SF, Parr-Dobrzanski R, Wiemann C. Dermal absorption study OECD TG 428 mass balance recommendations based on the EFSA database. Regul Toxicol Pharmacol 2019;108:104475. [PMID: 31539567 DOI: 10.1016/j.yrtph.2019.104475] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 08/21/2019] [Accepted: 09/13/2019] [Indexed: 11/24/2022]

Orjuela S, Huang R, Hembach KM, Robinson MD, Soneson C. ARMOR: An Automated Reproducible MOdular Workflow for Preprocessing and Differential Analysis of RNA-seq Data. G3 (Bethesda) 2019;9:2089-2096. [PMID: 31088905 PMCID: PMC6643886 DOI: 10.1534/g3.119.400185] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 05/13/2019] [Indexed: 01/08/2023]

Birgen C, Dürre P, Preisig HA, Wentzel A. Butanol production from lignocellulosic biomass: revisiting fermentation performance indicators with exploratory data analysis. Biotechnol Biofuels 2019;12:167. [PMID: 31297155 PMCID: PMC6598312 DOI: 10.1186/s13068-019-1508-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 06/19/2019] [Indexed: 05/09/2023]

Marini F, Binder H. pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components. BMC Bioinformatics 2019;20:331. [PMID: 31195976 PMCID: PMC6567655 DOI: 10.1186/s12859-019-2879-1] [Citation(s) in RCA: 118] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Accepted: 05/07/2019] [Indexed: 12/11/2022] Open

Sirén K, Fischer U, Vestner J. Automated supervised learning pipeline for non-targeted GC-MS data analysis. Anal Chim Acta X 2019;1:100005. [PMID: 33117972 PMCID: PMC7587030 DOI: 10.1016/j.acax.2019.100005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 12/21/2018] [Accepted: 01/02/2019] [Indexed: 11/15/2022] Open

Breault MS, Sacré P, González-Martínez J, Gale JT, Sarma SV. An exploratory data analysis method for identifying brain regions and frequencies of interest from large-scale neural recordings. J Comput Neurosci 2019;46:3-17. [PMID: 30511274 DOI: 10.1007/s10827-018-0705-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 08/28/2018] [Accepted: 10/23/2018] [Indexed: 10/27/2022]

Kneale C, Brown SD. Uncharted forest: A technique for exploratory data analysis. Talanta 2018;189:71-8. [PMID: 30086977 DOI: 10.1016/j.talanta.2018.06.061] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 06/18/2018] [Accepted: 06/19/2018] [Indexed: 11/22/2022]

Gross T, Mapstone M, Miramontes R, Padilla R, Cheema AK, Macciardi F, Federoff HJ, Fiandaca MS. Toward Reproducible Results from Targeted Metabolomic Studies: Perspectives for Data Pre-processing and a Basis for Analytic Pipeline Development. Curr Top Med Chem 2018;18:883-895. [PMID: 29992885 DOI: 10.2174/1568026618666180711144323] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 06/20/2018] [Accepted: 06/28/2018] [Indexed: 11/22/2022]

Affiliation(s)

Thomas Gross Translational Laboratory and Biorepository, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.,Department of Anatomy & Neurobiology, University of California, Irvine School of Medicine, Irvine, CA 92697, United States
Mark Mapstone Translational Laboratory and Biorepository, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.,Department of Neurology, University of California, Irvine School of Medicine, Irvine, CA 92697, United States
Ricardo Miramontes Translational Laboratory and Biorepository, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.,Department of Neurology, University of California, Irvine School of Medicine, Irvine, CA 92697, United States
Robert Padilla Translational Laboratory and Biorepository, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.,Department of Neurology, University of California, Irvine School of Medicine, Irvine, CA 92697, United States
Amrita K Cheema Department of Oncology, Georgetown University Medical Center, Washington DC, 20007, United States.,Department of Biochemistry and Molecular and Cellular Biology, Georgetown University Medical Center, Washington, DC, 20007, United States
Fabio Macciardi Translational Laboratory and Biorepository, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.,Department of Psychiatry and Human Behavior, University of California, Irvine School of Medicine, Irvine, CA 92697, United States
Howard J Federoff Translational Laboratory and Biorepository, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.,Department of Neurology, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.,UCI Health, University of California, Irvine School of Medicine, Irvine, CA 92697, United States
Massimo S Fiandaca Translational Laboratory and Biorepository, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.,Department of Anatomy & Neurobiology, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.,Department of Neurology, University of California, Irvine School of Medicine, Irvine, CA 92697, United States.,Department of Neurological Surgery, University of California, Irvine School of Medicine, Irvine, CA 92697, United States

Collapse

Owolabi FO, Oguntunde PE, Adetula DT, Fakile SA. Learning analytics: Data sets on the academic record of accounting students in a Nigerian University. Data Brief 2018;19:1614-1619. [PMID: 30246078 PMCID: PMC6141959 DOI: 10.1016/j.dib.2018.06.078] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 06/15/2018] [Accepted: 06/19/2018] [Indexed: 11/24/2022] Open

Abeysinghe R, Cui L. Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource. BMC Med Inform Decis Mak 2018;18:58. [PMID: 30066656 PMCID: PMC6069291 DOI: 10.1186/s12911-018-0633-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Abstract

Background

Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets will yield a large number of rules, many of which may be uninteresting. Especially for imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics.

Methods

We introduce a query-constraint-based ARM (QARM) approach for exploratory analysis of multiple, diverse clinical datasets in the National Sleep Research Resource (NSRR). QARM enables rule mining on a subset of data items satisfying a query constraint. We first perform a series of data-preprocessing steps including variable selection, merging semantically similar variables, combining multiple-visit data, and data transformation. We use Top-k Non-Redundant (TNR) ARM algorithm to generate association rules. Then we remove general and subsumed rules so that unique and non-redundant rules are resulted for a particular query constraint.

Results

Applying QARM on five datasets from NSRR obtained a total of 2517 association rules with a minimum confidence of 60% (using top 100 rules for each query constraint). The results show that merging similar variables could avoid uninteresting rules. Also, removing general and subsumed rules resulted in a more concise and interesting set of rules.

Conclusions

QARM shows the potential to support exploratory analysis of large biomedical datasets. It is also shown as a useful method to reduce the number of uninteresting association rules generated from imbalanced datasets. A preliminary literature-based analysis showed that some association rules have supporting evidence from biomedical literature, while others without literature-based evidence may serve as the candidates for new hypotheses to explore and investigate. Together with literature-based evidence, the association rules mined over the NSRR clinical datasets may be used to support clinical decisions for sleep-related problems.

Electronic supplementary material

The online version of this article (10.1186/s12911-018-0633-7) contains supplementary material, which is available to authorized users.

Collapse

Buttarazzi D, Pandolfo G, Porzio GC. A boxplot for circular data. Biometrics 2018;74:1492-1501. [PMID: 29782636 DOI: 10.1111/biom.12889] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 03/01/2018] [Accepted: 03/01/2018] [Indexed: 11/28/2022]

Alcaide D, Aerts J. MCLEAN: Multilevel Clustering Exploration As Network. PeerJ Comput Sci 2018;4:e145. [PMID: 33816801 PMCID: PMC7924466 DOI: 10.7717/peerj-cs.145] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 01/10/2018] [Indexed: 05/29/2023]

Zhu Q, Fisher SA, Dueck H, Middleton S, Khaladkar M, Kim J. PIVOT: platform for interactive analysis and visualization of transcriptomics data. BMC Bioinformatics 2018;19:6. [PMID: 29304726 PMCID: PMC5756333 DOI: 10.1186/s12859-017-1994-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2017] [Accepted: 12/06/2017] [Indexed: 11/29/2022] Open

Palace-Berl F, Pasqualoto KFM, Zingales B, Moraes CB, Bury M, Franco CH, da Silva Neto AL, Murayama JS, Nunes SL, Silva MN, Tavares LC. Investigating the structure-activity relationships of N'-[(5-nitrofuran-2-yl) methylene] substituted hydrazides against Trypanosoma cruzi to design novel active compounds. Eur J Med Chem 2017;144:29-40. [PMID: 29247858 DOI: 10.1016/j.ejmech.2017.12.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 11/29/2017] [Accepted: 12/02/2017] [Indexed: 10/18/2022]

Ge SX. Exploratory bioinformatics investigation reveals importance of "junk" DNA in early embryo development. BMC Genomics 2017;18:200. [PMID: 28231763 PMCID: PMC5324221 DOI: 10.1186/s12864-017-3566-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Accepted: 02/07/2017] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Instead of testing predefined hypotheses, the goal of exploratory data analysis (EDA) is to find what data can tell us. Following this strategy, we re-analyzed a large body of genomic data to study the complex gene regulation in mouse pre-implantation development (PD).

RESULTS

Starting with a single-cell RNA-seq dataset consisting of 259 mouse embryonic cells derived from zygote to blastocyst stages, we reconstructed the temporal and spatial gene expression pattern during PD. The dynamics of gene expression can be partially explained by the enrichment of transposable elements in gene promoters and the similarity of expression profiles with those of corresponding transposons. Long Terminal Repeats (LTRs) are associated with transient, strong induction of many nearby genes at the 2-4 cell stages, probably by providing binding sites for Obox and other homeobox factors. B1 and B2 SINEs (Short Interspersed Nuclear Elements) are correlated with the upregulation of thousands of nearby genes during zygotic genome activation. Such enhancer-like effects are also found for human Alu and bovine tRNA SINEs. SINEs also seem to be predictive of gene expression in embryonic stem cells (ESCs), raising the possibility that they may also be involved in regulating pluripotency. We also identified many potential transcription factors underlying PD and discussed the evolutionary necessity of transposons in enhancing genetic diversity, especially for species with longer generation time.

CONCLUSIONS

Together with other recent studies, our results provide further evidence that many transposable elements may play a role in establishing the expression landscape in early embryos. It also demonstrates that exploratory bioinformatics investigation can pinpoint developmental pathways for further study, and serve as a strategy to generate novel insights from big genomic data.

Collapse

Komenda M, Karolyi M, Pokorná A, Vaitsis C. Medical and Healthcare Curriculum Exploratory Analysis. Stud Health Technol Inform 2017;235:231-235. [PMID: 28423788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Pimentel H, Sturmfels P, Bray N, Melsted P, Pachter L. The Lair: a resource for exploratory analysis of published RNA-Seq data. BMC Bioinformatics 2016;17:490. [PMID: 27905880 DOI: 10.1186/s12859-016-1357-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 11/19/2016] [Indexed: 11/10/2022] Open

González-Calabozo JM, Valverde-Albacete FJ, Peláez-Moreno C. Interactive knowledge discovery and data mining on genomic expression data with numeric formal concept analysis. BMC Bioinformatics 2016;17:374. [PMID: 27628041 DOI: 10.1186/s12859-016-1234-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 09/01/2016] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Gene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) paradigm. Biclustering has emerged as the machine learning method of choice to solve this task, but its unsupervised nature makes result assessment problematic. This is often addressed by means of Gene Set Enrichment Analysis (GSEA).

RESULTS

We put forward a framework in which GED analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for continuous human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of EDA. First, we give a proper theoretical background to bi-clustering using Lattice Theory and provide a set of analysis tools revolving around [Formula: see text]-Formal Concept Analysis ([Formula: see text]-FCA), a lattice-theoretic unsupervised learning technique for real-valued matrices. By using different kinds of cost structures to quantify expression we obtain different sequences of hierarchical bi-clusterings for gene under- and over-expression using thresholds. Consequently, we provide a method with interleaved analysis steps and visualization devices so that the sequences of lattices for a particular experiment summarize the researcher's vision of the data. This also allows us to define measures of persistence and robustness of biclusters to assess them. Second, the resulting biclusters are used to index external omics databases-for instance, Gene Ontology (GO)-thus offering a new way of accessing publicly available resources. This provides different flavors of gene set enrichment against which to assess the biclusters, by obtaining their p-values according to the terminology of those resources. We illustrate the exploration procedure on a real data example confirming results previously published.

CONCLUSIONS

The GED analysis problem gets transformed into the exploration of a sequence of lattices enabling the visualization of the hierarchical structure of the biclusters with a certain degree of granularity. The ability of FCA-based bi-clustering methods to index external databases such as GO allows us to obtain a quality measure of the biclusters, to observe the evolution of a gene throughout the different biclusters it appears in, to look for relevant biclusters-by observing their genes and what their persistence is-to infer, for instance, hypotheses on their function.

Collapse

Konarska M, Kuchida K, Tarr G, Polkinghorne RJ. Relationships between marbling measures across principal muscles. Meat Sci 2016;123:67-78. [PMID: 27639062 DOI: 10.1016/j.meatsci.2016.09.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 08/16/2016] [Accepted: 09/09/2016] [Indexed: 11/24/2022]

Palace-Berl F, Pasqualoto KF, Jorge SD, Zingales B, Zorzi RR, Silva MN, Ferreira AK, de Azevedo RA, Teixeira SF, Tavares LC. Designing and exploring active N'-[(5-nitrofuran-2-yl) methylene] substituted hydrazides against three Trypanosoma cruzi strains more prevalent in Chagas disease patients. Eur J Med Chem 2015;96:330-9. [PMID: 25899337 DOI: 10.1016/j.ejmech.2015.03.066] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Revised: 02/26/2015] [Accepted: 03/30/2015] [Indexed: 12/28/2022]

Damião MCFCB, Pasqualoto KFM, Ferreira AK, Teixeira SF, Azevedo RA, Barbuto JAM, Palace-Berl F, Franchi-Junior GC, Nowill AE, Tavares MT, Parise-Filho R. Novel capsaicin analogues as potential anticancer agents: synthesis, biological evaluation, and in silico approach. Arch Pharm (Weinheim) 2014;347:885-95. [PMID: 25283529 DOI: 10.1002/ardp.201400233] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Revised: 07/28/2014] [Accepted: 08/15/2014] [Indexed: 11/07/2022]

Jung S, Jang K, Yoon Y, Kang S. Contributing factors to vehicle to vehicle crash frequency and severity under rainfall. J Safety Res 2014;50:1-10. [PMID: 25142355 DOI: 10.1016/j.jsr.2014.01.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Revised: 01/09/2014] [Accepted: 01/14/2014] [Indexed: 06/03/2023]

Xiao F, Gulliver JS, Simcik MF. Perfluorooctane sulfonate (PFOS) contamination of fish in urban lakes: a prioritization methodology for lake management. Water Res 2013;47:7264-7272. [PMID: 24184022 DOI: 10.1016/j.watres.2013.09.063] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Revised: 08/06/2013] [Accepted: 09/01/2013] [Indexed: 06/02/2023]

Marino MJ. The use and misuse of statistical methodologies in pharmacology research. Biochem Pharmacol 2013;87:78-92. [PMID: 23747488 DOI: 10.1016/j.bcp.2013.05.017] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Accepted: 05/20/2013] [Indexed: 11/27/2022]

Sato Y, Gosho M, Toshimori K. Usefulness of statistics for establishing evidence-based reproductive medicine. Reprod Med Biol 2011;11:49-58. [PMID: 29699105 DOI: 10.1007/s12522-011-0106-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2011] [Accepted: 07/12/2011] [Indexed: 11/29/2022] Open