Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

7
(from Reference Citation Analysis)

Article PDFs (2)

Cited by > 0 (5)

Searched Name

Data merging

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Polaka I, Razuka-Ebela D, Park JY, Leja M. Taxonomy-based data representation for data mining: an example of the magnitude of risk associated with H. pylori infection. BioData Min 2021;14:43. [PMID: 34454568 PMCID: PMC8400764 DOI: 10.1186/s13040-021-00271-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Accepted: 08/08/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The amount of available and potentially significant data describing study subjects is ever growing with the introduction and integration of different registries and data banks. The single specific attribute of these data are not always necessary; more often, membership to a specific group (e.g. diet, social 'bubble', living area) is enough to build a successful machine learning or data mining model without overfitting it. Therefore, in this article we propose an approach to building taxonomies using clustering to replace detailed data from large heterogenous data sets from different sources, while improving interpretability. We used the GISTAR study data base that holds exhaustive self-assessment questionnaire data to demonstrate this approach in the task of differentiating between H. pylori positive and negative study participants, and assessing their potential risk factors. We have compared the results of taxonomy-based classification to the results of classification using raw data.

RESULTS

Evaluation of our approach was carried out using 6 classification algorithms that induce rule-based or tree-based classifiers. The taxonomy-based classification results show no significant loss in information, with similar and up to 2.5% better classification accuracy. Information held by 10 and more attributes can be replaced by one attribute demonstrating membership to a cluster in a hierarchy at a specific cut. The clusters created this way can be easily interpreted by researchers (doctors, epidemiologists) and describe the co-occurring features in the group, which is significant for the specific task.

CONCLUSIONS

While there are always features and measurements that must be used in data analysis as they are, the use of taxonomies for the description of study subjects in parallel allows using membership to specific naturally occurring groups and their impact on an outcome. This can decrease the risk of overfitting (picking attributes and values specific to the training set without explaining the underlying conditions), improve the accuracy of the models, and improve privacy protection of study participants by decreasing the amount of specific information used to identify the individual.

Collapse

Winter C, Jung K. Mining Protein Expression Databases Using Network Meta-Analysis. Methods Mol Biol 2021;2228:419-431. [PMID: 33950507 DOI: 10.1007/978-1-0716-1024-4_29] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Pedersen JW, Larsen LH, Thirsing C, Vezzaro L. Reconstruction of corrupted datasets from ammonium-ISE sensors at WRRFs through merging with daily composite samples. Water Res 2020;185:116227. [PMID: 32736284 DOI: 10.1016/j.watres.2020.116227] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 07/06/2020] [Accepted: 07/23/2020] [Indexed: 06/11/2023]

Abstract

Long-term, continuous datasets of high quality are important for instrumentation, control, and automation efforts of wastewater resources recovery facility (WRRFs). This study presents a methodology to increase the reliability of measurements from ammonium ion-selective electrodes (ISEs). This is done by correcting corrupted ISE data with a data source that often is available at WRRFs (volume-proportional composite samples). A yearlong measurement campaign showed that the existing standard protocols for sensor maintenance might still create corrupted dataset, with poor sensor recalibrations responsible for abrupt and unrealistic jumps in the measurements. The proposed automatic correction methodology removes both recalibration jumps and signal drift by using information from composite samples that already are taken for reporting to legal authorities. Results showed that the developed methodology provided a continuous, high-quality time series without the major data quality issues of the original signal. In fact, the signal was improved for 87% of days when a reference sample was available. The effect of correcting the data before use in a data-driven software sensor was also investigated. The corrected dataset led to noticeably smaller day-to-day variations in estimated NH₄⁺ loads, and to large improvements on both median estimates and prediction bounds. The long time series allowed for an investigation of how much training data that is required to fit a software sensor, which provides estimates that are representative for the entire study period. The results showed that 8 weeks of data allowed for a good median estimate, while 16 weeks are required for obtaining good 80% prediction bounds. Overall, the proposed method can increase the applicability of relatively cheaper ISE sensors for ICA application within WRRFs.

Collapse

Liu L, O'Donnell P, Sullivan R, Katalinic A, Moser L, de Boer A, Meunier F. Cancer in Europe: Death sentence or life sentence? Eur J Cancer 2016;65:150-5. [PMID: 27498140 DOI: 10.1016/j.ejca.2016.07.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Accepted: 07/05/2016] [Indexed: 11/26/2022]

Aller P, Geng T, Evans G, Foadi J. Applications of the BLEND Software to Crystallographic Data from Membrane Proteins. Adv Exp Med Biol 2016;922:119-35. [PMID: 27553239 DOI: 10.1007/978-3-319-35072-1_9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Hokamp K. Perl One-Liners: Bridging the Gap Between Large Data Sets and Analysis Tools. Methods Mol Biol 2015;1326:177-91. [PMID: 26498621 DOI: 10.1007/978-1-4939-2839-2_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]

Wang F, Song PXK, Wang L. Merging multiple longitudinal studies with study-specific missing covariates: A joint estimating function approach. Biometrics 2015;71:929-40. [PMID: 26193911 DOI: 10.1111/biom.12356] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 04/01/2015] [Accepted: 05/01/2015] [Indexed: 11/28/2022]