1
|
Design and Implementation of a Fuzzy Classifier for FDI Applied to Industrial Machinery. SENSORS (BASEL, SWITZERLAND) 2023; 23:6954. [PMID: 37571738 PMCID: PMC10422568 DOI: 10.3390/s23156954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/28/2023] [Accepted: 08/01/2023] [Indexed: 08/13/2023]
Abstract
In the present work, the design and the implementation of a Fault Detection and Isolation (FDI) system for an industrial machinery is proposed. The case study is represented by a multishaft centrifugal compressor used for the syngas manufacturing. The system has been conceived for the monitoring of the faults which may damage the multishaft centrifugal compressor: instrument single and multiple faults have been considered as well as process faults like fouling of the compressor stages and break of the thrust bearing. A new approach that combines Principal Component Analysis (PCA), Cluster Analysis and Pattern Recognition is developed. A novel procedure based on the statistical test ANOVA (ANalysis Of VAriance) is applied to determine the most suitable number of Principal Components (PCs). A key design issue of the proposed fault isolation scheme is the data Cluster Analysis performed to solve the practical issue of the complexity growth experienced when analyzing process faults, which typically involve many variables. In addition, an automatic online Pattern Recognition procedure for finding the most probable faults is proposed. Clustering procedure and Pattern Recognition are implemented within a Fuzzy Faults Classifier module. Experimental results on real plant data illustrate the validity of the approach. The main benefits produced by the FDI system concern the improvement of the maintenance operations, the enhancement of the reliability and availability of the compressor, the increase in the plant safety while achieving reduction in plant functioning costs.
Collapse
|
2
|
Value of baseline characteristics in the risk prediction of atrial fibrillation. Front Cardiovasc Med 2023; 10:1068562. [PMID: 36818333 PMCID: PMC9928725 DOI: 10.3389/fcvm.2023.1068562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 01/16/2023] [Indexed: 02/04/2023] Open
Abstract
Introduction Atrial fibrillation (AF) is prone to heart failure and stroke. Early management can effectively reduce the stroke rate and mortality. Current clinical guidelines screen high-risk individuals based solely on age, while this study aims to explore the possibility of other AF risk predictors. Methods A total of 18,738 elderly people (aged over 60 years old) in Chinese communities were enrolled in this study. The baseline characteristics were mainly based on the diagnosis results of electrocardiogram (ECG) machine during follow up, accompanied by some auxiliary physical examination basic data. After the analysis of both independent and combined baseline characteristics, AF risk predictors were obtained and prioritized according to the results. Independent characteristics were studied from three aspects: Chi-square test, Mann-Whitney U test and Cox univariate regression analysis. Combined characteristics were studied from two aspects: machine learning models and Cox multivariate regression analysis, and the former was combined with recursive feature elimination method and voting decision. Results The resulted optimal combination of risk predictors included age, atrial premature beats, atrial flutter, left ventricular hypertrophy, hypertension and heart disease. Conclusion Patients diagnosed by short-time ECG machines with the occurrence of the above events had a higher probability of AF episodes, who are suggested to be included in the focus of long-term ECG monitoring or increased screening density. The incidence of risk predictors in different age ranges of AF patients suggests differences in age-specific patient management. This can help improve the detection rate of AF, standardize the management of patients, and slow down the progression of AF.
Collapse
|
3
|
SpatialCorr identifies gene sets with spatially varying correlation structure. CELL REPORTS METHODS 2022; 2:100369. [PMID: 36590683 PMCID: PMC9795364 DOI: 10.1016/j.crmeth.2022.100369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 09/26/2022] [Accepted: 11/21/2022] [Indexed: 12/15/2022]
Abstract
Recent advances in spatially resolved transcriptomics technologies enable both the measurement of genome-wide gene expression profiles and their mapping to spatial locations within a tissue. A first step in spatial transcriptomics data analysis is identifying genes with expression that varies spatially, and robust statistical methods exist to address this challenge. While useful, these methods do not detect spatial changes in the coordinated expression within a group of genes. To this end, we present SpatialCorr, a method for identifying sets of genes with spatially varying correlation structure. Given a collection of gene sets pre-defined by a user, SpatialCorr tests for spatially induced differences in the correlation of each gene set within tissue regions, as well as between and among regions. An application to cutaneous squamous cell carcinoma demonstrates the power of the approach for revealing biological insights not identified using existing methods.
Collapse
|
4
|
Comparison of Electrodermal Activity from Multiple Body Locations Based on Standard EDA Indices' Quality and Robustness against Motion Artifact. SENSORS (BASEL, SWITZERLAND) 2022; 22:3177. [PMID: 35590866 PMCID: PMC9104297 DOI: 10.3390/s22093177] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 04/14/2022] [Accepted: 04/19/2022] [Indexed: 06/15/2023]
Abstract
The most traditional sites for electrodermal activity (EDA) data collection, palmar locations such as fingers or palms, are not usually recommended for ambulatory monitoring given that subjects have to use their hands regularly during their daily activities, and therefore, alternative sites are often sought for EDA data collection. In this study, we collected EDA signals (n = 23 subjects, 19 male) from four measurement sites (forehead, back of neck, finger, and inner edge of foot) during cognitive stress and induction of mild motion artifacts by walking and one-handed weightlifting. Furthermore, we computed several EDA indices from the EDA signals obtained from different sites and evaluated their efficiency to classify cognitive stress from the baseline state. We found a high within-subject correlation between the EDA signals obtained from the finger and the feet. Consistently high correlation was also found between the finger and the foot EDA in both the phasic and tonic components. Statistically significant differences were obtained between the baseline and cognitive stress stage only for the EDA indices computed from the finger and the foot EDA. Moreover, the receiver operating characteristic curve for cognitive stress detection showed a higher area-under-the-curve for the EDA indices computed from the finger and foot EDA. We also evaluated the robustness of the different body sites against motion artifacts and found that the foot EDA location was the best alternative to other sites.
Collapse
|
5
|
Is the New EN689 a Better Standard to Test Compliance With Occupational Exposure Limits in the Workplace? Ann Work Expo Health 2021; 66:412-415. [PMID: 34864829 PMCID: PMC8922169 DOI: 10.1093/annweh/wxab111] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 10/05/2021] [Accepted: 11/17/2021] [Indexed: 12/05/2022] Open
Abstract
Objective To evaluate the performance of three measurement strategies to test compliance with occupational exposure limits of similarly exposed groups (SEGs): the old and new versions of EN689, and the BOHS-NVvA guidance on measuring compliance. Methods Respirable dust exposures concentrations (n = 1383) measured within the member companies of IMA-Europe were used to compare compliance decisions between the three measurement strategies. A total of 210 SEGs of which 158 with repeated measurements were analysed. An R studio OHcomplianceStrategies package was created for the purpose. Results The old EN689 strategy resulted in the highest number of compliant SEGs in the preliminary tests and statistical test (49–52% and 83%) with lower percentages of compliance with the new EN689 standard (32–44% and 71%). The percentage of non-compliant SEGs was relatively similar between the old and new EN689 for the preliminary tests (1–12% versus 6–11%). However, the new EN689 declared almost twofold more SEGs non-compliant when applying the statistical test (29% versus 17%). The BOHS-NVvA individual test showed results in between the 26% non-compliant SEGs. Conclusion This study showed differences in compliance decisions between the old and new EN689, with the new EN689 being considerably more stringent and resulting in more non-compliant SEGs.
Collapse
|
6
|
Design and Analysis Methods for Trials with AI-Based Diagnostic Devices for Breast Cancer. J Pers Med 2021; 11:jpm11111150. [PMID: 34834502 PMCID: PMC8617855 DOI: 10.3390/jpm11111150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 11/02/2021] [Accepted: 11/02/2021] [Indexed: 11/24/2022] Open
Abstract
Imaging is important in cancer diagnostics. It takes a long period of medical training and clinical experience for radiologists to be able to accurately interpret diagnostic images. With the advance of big data analysis, machine learning and AI-based devices are currently under development and taking a role in imaging diagnostics. If an AI-based imaging device can read the image as accurately as experienced radiologists, it may be able to help radiologists increase the accuracy of their reading and manage their workloads. In this paper, we consider two potential study objectives of a clinical trial to evaluate an AI-based device for breast cancer diagnosis by comparing its concordance with human radiologists. We propose statistical design and analysis methods for each study objective. Extensive numerical studies are conducted to show that the proposed statistical testing methods control the type I error rate accurately and the design methods provide required sample sizes with statistical powers close to pre-specified nominal levels. The proposed methods were successfully used to design and analyze a real device trial.
Collapse
|
7
|
How to choose and interpret a statistical test? An update for budding researchers. J Family Med Prim Care 2021; 10:2763-2767. [PMID: 34660402 PMCID: PMC8483143 DOI: 10.4103/jfmpc.jfmpc_433_21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 03/29/2021] [Accepted: 05/12/2021] [Indexed: 11/04/2022] Open
Abstract
Postgraduate medical students are often not able to select and interpret the findings of statistical tests during their thesis or research projects. To go ahead with selection of tests to be performed, researchers need to determine the objectives of study, types of variables, analysis and the study design, number of groups and data sets, and the types of distribution. In this review, we summarize and explain various statistical tests to help postgraduate medical students to select the most appropriate techniques for their thesis and dissertation.
Collapse
|
8
|
A Comparative Study of Common Nature-Inspired Algorithms for Continuous Function Optimization. ENTROPY (BASEL, SWITZERLAND) 2021; 23:874. [PMID: 34356415 PMCID: PMC8304592 DOI: 10.3390/e23070874] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 06/25/2021] [Accepted: 06/30/2021] [Indexed: 11/21/2022]
Abstract
Over previous decades, many nature-inspired optimization algorithms (NIOAs) have been proposed and applied due to their importance and significance. Some survey studies have also been made to investigate NIOAs and their variants and applications. However, these comparative studies mainly focus on one single NIOA, and there lacks a comprehensive comparative and contrastive study of the existing NIOAs. To fill this gap, we spent a great effort to conduct this comprehensive survey. In this survey, more than 120 meta-heuristic algorithms have been collected and, among them, the most popular and common 11 NIOAs are selected. Their accuracy, stability, efficiency and parameter sensitivity are evaluated based on the 30 black-box optimization benchmarking (BBOB) functions. Furthermore, we apply the Friedman test and Nemenyi test to analyze the performance of the compared NIOAs. In this survey, we provide a unified formal description of the 11 NIOAs in order to compare their similarities and differences in depth and a systematic summarization of the challenging problems and research directions for the whole NIOAs field. This comparative study attempts to provide a broader perspective and meaningful enlightenment to understand NIOAs.
Collapse
|
9
|
Between-group comparison of area under the curve in clinical trials with censored follow-up: Application to HIV therapeutic vaccines. Stat Methods Med Res 2021; 30:2130-2147. [PMID: 34218746 DOI: 10.1177/09622802211023963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In clinical trials, longitudinal data are commonly analyzed and compared between groups using a single summary statistic such as area under the outcome versus time curve (AUC). However, incomplete data, arising from censoring due to a limit of detection or missing data, can bias these analyses. In this article, we present a statistical test based on splines-based mixed-model accounting for both the censoring and missingness mechanisms in the AUC estimation. Inferential properties of the proposed method were evaluated and compared to ad hoc approaches and to a non-parametric method through a simulation study based on two-armed trial where trajectories and the proportion of missing data were varied. Simulation results highlight that our approach has significant advantages over the other methods. A real working example from two HIV therapeutic vaccine trials is presented to illustrate the applicability of our approach.
Collapse
|
10
|
Colonization process determines species diversity via competitive quasi-exclusion. Ecol Evol 2021; 11:4470-4480. [PMID: 33976823 PMCID: PMC8093681 DOI: 10.1002/ece3.7342] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 01/26/2021] [Accepted: 01/31/2021] [Indexed: 11/25/2022] Open
Abstract
A colonization model provides a useful basis to investigate a role of interspecific competition in species diversity. The model formulates colonization processes of propagules competing for spatially distinct habitats, which is known to result in stable coexistence of multiple species under various trade-off, for example, competition-colonization and fecundity-mortality trade-offs. Based on this model, we propose a new theory to explain patterns of species abundance, assuming a trade-off between competitive ability and fecundity among species. This model makes testable predictions about species positions in the rank abundance diagram under a discrete species competitiveness. The predictions were tested by three data of animal communities, which supported our model, suggesting the importance of interspecific competition in community structure. Our approach provides a new insight into understanding a mechanism of species diversity.
Collapse
|
11
|
More Confidence Intervals and Fewer p Values: A Positive Trend? J Am Coll Cardiol 2021; 77:1562-1563. [PMID: 33766263 DOI: 10.1016/j.jacc.2021.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 02/04/2021] [Indexed: 10/21/2022]
|
12
|
Abstract
The drift-diffusion model (DDM) is a model of sequential sampling with diffusion signals, where the decision maker accumulates evidence until the process hits either an upper or lower stopping boundary and then stops and chooses the alternative that corresponds to that boundary. In perceptual tasks, the drift of the process is related to which choice is objectively correct, whereas in consumption tasks, the drift is related to the relative appeal of the alternatives. The simplest version of the DDM assumes that the stopping boundaries are constant over time. More recently, a number of papers have used nonconstant boundaries to better fit the data. This paper provides a statistical test for DDMs with general, nonconstant boundaries. As a by-product, we show that the drift and the boundary are uniquely identified. We use our condition to nonparametrically estimate the drift and the boundary and construct a test statistic based on finite samples.
Collapse
|
13
|
Between-Batch Bioequivalence (BBE): a Statistical Test to Evaluate In Vitro Bioequivalence Considering the Between-Batch Variability. AAPS J 2020; 22:119. [PMID: 32910283 PMCID: PMC7651657 DOI: 10.1208/s12248-020-00486-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 07/13/2020] [Indexed: 11/30/2022] Open
Abstract
Bioequivalence testing is an essential step during the development of generic drugs. Regulatory agencies have drafted recommendations and guidelines to frame this step but without finding any consensus. Different methodologies are applied depending on the geographical region. For instance, in the EU, EMA recommends using average bioequivalence test (ABE), while in the USA, FDA recommends using population bioequivalence (PBE) test. Both methods present some limitations (e.g., when batch variability is non-negligible) making it difficult to conclude to equivalence without subsequently increasing the sample size. This article proposes an alternative method to evaluate bioequivalence: between-batch bioequivalence (BBE). It is based on the comparison between the mean difference (Reference − Test) and the Reference between-batch variability. After presenting the theoretical concepts, BBE relevance is evaluated through simulation and real case (nasal spray) studies. Simulation results showed high performance of the method based on false positive and false negative rate estimations (type I and type II errors respectively). Especially, BBE has shown significantly greater true positive rates than ABE and PBE when the Reference residual standard deviation is higher than 15%, depending on the between-batch variability and the number of batches. Finally, real case applications revealed that BBE is more efficient than ABE and PBE to demonstrate equivalence, in some well-known situations where the between-batch variability is not negligible. These results suggest that BBE could be considered as an alternative to the state-of-the-art methods allowing costless development. Graphical abstract ![]()
Collapse
|
14
|
A novel statistical method for interpreting the pathogenicity of rare variants. Genet Med 2020; 23:59-68. [PMID: 32884132 PMCID: PMC7796914 DOI: 10.1038/s41436-020-00948-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 08/11/2020] [Accepted: 08/12/2020] [Indexed: 01/09/2023] Open
Abstract
Purpose: To achieve the ultimate goal of personalized treatment of patients, accurate molecular diagnosis and precise interpretation of the impact of genetic variants on gene function is essential. With the sequencing cost becoming increasingly affordable, accurate distinguishing benign from pathogenic variants upon sequencing becomes the major bottleneck. Although large normal population sequence databases have become a key resource in filtering benign variants, they are not effective at filtering extremely rare variants. Methods: To address this challenge, we developed a novel statistical test by combining sequencing data from a patient cohort with a normal control population database. By comparing the expected and observed allele frequency in the patient cohort, variants that are likely benign can be identified. Results: The performance of this new method is evaluated on both simulated and real datasets coupled with experimental validation. As a result, we demonstrate this new test is well-powered to identify benign variants, particularly effective for variants with low frequency in the normal population. Conclusion: Overall, as a general test that can be applied to any type of variants in the context of all Mendelian diseases, our work provides a general framework for filtering benign variants with very low population allele frequency.
Collapse
|
15
|
A Simple Method to Identify the Dominant Fouling Mechanisms during Membrane Filtration Based on Piecewise Multiple Linear Regression. MEMBRANES 2020; 10:membranes10080171. [PMID: 32751292 PMCID: PMC7465108 DOI: 10.3390/membranes10080171] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 07/23/2020] [Accepted: 07/27/2020] [Indexed: 11/16/2022]
Abstract
Membrane fouling is a complicated issue in microfiltration and ultrafiltration. Clearly identifying the dominant fouling mechanisms during the filtration process is of great significance for the phased and targeted control of fouling. To this end, we propose a semi-empirical multiple linear regression model to describe flux decline, incorporating the five fouling mechanisms (the first and second kinds of standard blocking, complete blocking, intermediate blocking, and cake filtration) based on the additivity of the permeate volume contributed by different coexisting mechanisms. A piecewise fitting protocol was established to distinguish the fouling stages and find the significant mechanisms in each stage. This approach was applied to a case study of a microfiltration membrane filtering a model foulant solution composed of polysaccharide, protein, and humic substances, and the model fitting unequivocally revealed that the dominant fouling mechanism evolved in the sequence of initial adaptation, fast adsorption followed by slow adsorption inside the membrane pores, and the gradual growth of a cake/gel layer on the membrane surface. The results were in good agreement with the permeate properties (total organic carbon, ultraviolet absorbance, and fluorescence) during the filtration process. This modeling approach proves to be simple and reliable for identifying the main fouling mechanisms during membrane filtration with statistical confidence.
Collapse
|
16
|
An Energy-Efficient Redundant Transmission Control Clustering Approach for Underwater Acoustic Networks. SENSORS 2019; 19:s19194241. [PMID: 31574894 PMCID: PMC6806349 DOI: 10.3390/s19194241] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2019] [Revised: 09/17/2019] [Accepted: 09/24/2019] [Indexed: 12/02/2022]
Abstract
Underwater Acoustic Network (UAN) is an emerging technology with attractive applications. In such type of networks, the control-overhead, redundant inner-network transmissions management, and data-similarity are still very challenging. The cluster-based frameworks manage the control-overhead and redundant inner-network transmissions persuasively. However, the current clustering protocols consume a big part of their energy resources in data-similarity as these protocols periodically sense and forward the same information. In this paper, we introduce a novel two-level Redundant Transmission Control (RTC) approach that ensures the data-similarity using some statistical tests with an appropriate degree of confidence. Later, the Cluster Head (CH) and the Region Head (RH) remove the data-similarity from the original data before forwarding it to the next level. We also introduce a new spatiotemporal and dynamic CH role rotation technique which is capable to adjust the drifted field nodes because of water current movements. The beauty of the proposed model is that the RH controls the communications and redundant transmission between the CH and Mobile Sink (MS), while the CH controls the redundant inner-network transmissions and data-similarity between the cluster members. We conduct simulations to evaluate the performance of our designed framework under different criteria such as average end-to-end delay, the packet delivery ratio, and energy consumption of the network with respect to the recent schemes. The presented results reveal that the proposed model outperforms the current approaches in terms of the selected metrics.
Collapse
|
17
|
Detection of focal electroencephalogram signals using higher-order moments in EMD-TKEO domain. Healthc Technol Lett 2019; 6:64-69. [PMID: 31341630 PMCID: PMC6595538 DOI: 10.1049/htl.2018.5036] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 02/27/2019] [Accepted: 03/27/2019] [Indexed: 11/19/2022] Open
Abstract
Detection of epileptogenic focus based on electroencephalogram (EEG) signal screening is an important pre-surgical step to remove affected regions inside the human brain. Considering the fact above, in this work, a novel technique for detection of focal EEG signals is proposed using a combination of empirical mode decomposition (EMD) and Teager–Kaiser energy operator (TKEO). EEG signals belonging to focal (Fo) and non-focal (NFo) groups were at first decomposed into a set of intrinsic mode functions (IMFs) using EMD. Next, TKEO was applied on each IMF and two higher-order statistical moments namely skewness and kurtosis were extracted as features from TKEO of each IMF. The statistical significance of the selected features was evaluated using student's t-test and based on the statistical test, features from first three IMFs which show very high discriminative capability were selected as inputs to a support vector machine classifier for discrimination of Fo and NFo signals. It was observed that the classification accuracy of 92.65% is obtained in classifying EEG signals using a radial basis kernel function, which demonstrates the efficacy of proposed EMD-TKEO based feature extraction method for computer-based treatment of patients suffering from focal seizures.
Collapse
|
18
|
[On the use of mathematical statistics methods in clinical and experimental studies.]. ADVANCES IN GERONTOLOGY = USPEKHI GERONTOLOGII 2019; 32:1052-1062. [PMID: 32160448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In the paper a number of issues, including conceptual ones, related to the specific features of making statistical decisions, choosing tests and computing characteristics that will strengthen the evidence base of the conclusions obtained when analyzing data using methods of mathematical statistics, are considered. The paper aims not to describe the methods of mathematical statistics themselves, but to analyze the conditions and the need to apply the most common tests. In particular, the magnitude of the indicator of the statistical significance of the observed effects - p-value - and the sample size to obtain a significant effect are discussed, the effect of multiple comparisons, the application of the Bayesian approach, and others are considered.
Collapse
|
19
|
Empirical Comparison of Publication Bias Tests in Meta-Analysis. J Gen Intern Med 2018; 33:1260-1267. [PMID: 29663281 PMCID: PMC6082203 DOI: 10.1007/s11606-018-4425-7] [Citation(s) in RCA: 141] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 03/07/2018] [Accepted: 03/27/2018] [Indexed: 01/21/2023]
Abstract
BACKGROUND Decision makers rely on meta-analytic estimates to trade off benefits and harms. Publication bias impairs the validity and generalizability of such estimates. The performance of various statistical tests for publication bias has been largely compared using simulation studies and has not been systematically evaluated in empirical data. METHODS This study compares seven commonly used publication bias tests (i.e., Begg's rank test, trim-and-fill, Egger's, Tang's, Macaskill's, Deeks', and Peters' regression tests) based on 28,655 meta-analyses available in the Cochrane Library. RESULTS Egger's regression test detected publication bias more frequently than other tests (15.7% in meta-analyses of binary outcomes and 13.5% in meta-analyses of non-binary outcomes). The proportion of statistically significant publication bias tests was greater for larger meta-analyses, especially for Begg's rank test and the trim-and-fill method. The agreement among Tang's, Macaskill's, Deeks', and Peters' regression tests for binary outcomes was moderately strong (most κ's were around 0.6). Tang's and Deeks' tests had fairly similar performance (κ > 0.9). The agreement among Begg's rank test, the trim-and-fill method, and Egger's regression test was weak or moderate (κ < 0.5). CONCLUSIONS Given the relatively low agreement between many publication bias tests, meta-analysts should not rely on a single test and may apply multiple tests with various assumptions. Non-statistical approaches to evaluating publication bias (e.g., searching clinical trials registries, records of drug approving agencies, and scientific conference proceedings) remain essential.
Collapse
|
20
|
A Novel Method for Assessing the Statistical Significance of RNA-RNA Interactions Between Two Long RNAs. J Comput Biol 2018; 25:976-986. [PMID: 29963900 DOI: 10.1089/cmb.2017.0260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
RNA-RNA interactions are key mechanisms through which noncoding RNA (ncRNA) regions exert biological functions. Computational prediction of RNA-RNA interactions is an essential method for detecting novel RNA-RNA interactions because their comprehensive detection by biological experimentation is still quite difficult. Many RNA-RNA interaction prediction tools have been developed, but they tend to produce many false positives. Accordingly, assessment of the statistical significance of computationally predicted interactions is an important task. However, there is no method to evaluate the statistical significance of RNA-RNA interactions that is applicable to interactions between two long RNA sequences. We developed a method to calculate the p-value for the minimal interaction energy between two long RNA sequences. The developed method depends on the fact that minimum interaction energies of RNA-RNA interactions between long RNAs follow a Gumbel distribution when repeat sequences in RNAs are masked. To show the usefulness of the developed method, we applied it to whole human 5'-untranslated region (UTR) and 3'-UTR sequences to detect novel 5'-UTR-3'-UTR interactions. We thus identified two significant 5'-UTR-3'-UTR interactions. Specifically, the human small proline-rich repeat protein 3 shows conserved 5'-UTR-3'-UTR interactions with some nucleotide variations preserving base pairings among primates. Our developed method enables us to detect statistically significant RNA-RNA interactions between long RNAs such as long ncRNAs. Statistical significance estimates help in identification of interactions for experimental validation and provide novel insights into the function of ncRNA regions.
Collapse
|
21
|
Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies. Genes (Basel) 2018; 9:E132. [PMID: 29495636 PMCID: PMC5867853 DOI: 10.3390/genes9030132] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 01/30/2018] [Accepted: 02/16/2018] [Indexed: 12/23/2022] Open
Abstract
Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the Accurate Species TRee ALgorithm (ASTRAL) package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest.
Collapse
|
22
|
Towards integrated oncogenic marker recognition through mutual information-based statistically significant feature extraction: an association rule mining based study on cancer expression and methylation profiles. QUANTITATIVE BIOLOGY 2017; 5:302-327. [PMID: 30221015 DOI: 10.1007/s40484-017-0119-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Background Marker detection is an important task in complex disease studies. Here we provide an association rule mining (ARM) based approach for identifying integrated markers through mutual information (MI) based statistically significant feature extraction, and apply it to acute myeloid leukemia (AML) and prostate carcinoma (PC) gene expression and methylation profiles. Methods We first collect the genes having both expression and methylation values in AML as well as PC. Next, we run Jarque-Bera normality test on the expression/methylation data to divide the whole dataset into two parts: one that ollows normal distribution and the other that does not follow normal distribution. Thus, we have now four parts of the dataset: normally distributed expression data, normally distributed methylation data, non-normally distributed expression data, and non-normally distributed methylated data. A feature-extraction technique, "mRMR" is then utilized on each part. This results in a list of top-ranked genes. Next, we apply Welch t-test (parametric test) and Shrink t-test (non-parametric test) on the expression/methylation data for the top selected normally distributed genes and non-normally distributed genes, respectively. We then use a recent weighted ARM method, "RANWAR" to combine all/specific resultant genes to generate top oncogenic rules along with respective integrated markers. Finally, we perform literature search as well as KEGG pathway and Gene-Ontology (GO) analyses using Enrichr database for in silico validation of the prioritized oncogenes as the markers and labeling the markers as existing or novel. Results The novel markers of AML are {ABCB11↑∪KRT17↓} (i.e., ABCB11 as up-regulated, & KRT17 as down-regulated), and {AP1S1-∪KRT17↓∪NEIL2-∪DYDC1↓}) (i.e., AP1S1 and NEIL2 both as hypo-methylated, & KRT17 and DYDC1 both as down-regulated). The novel marker of PC is {UBIAD1¶∪APBA2‡∪C4orf31‡} (i.e., UBIAD1 as up-regulated and hypo-methylated, & APBA2 and C4orf31 both as down-regulated and hyper-methylated). Conclusion The identified novel markers might have critical roles in AML as well as PC. The approach can be applied to other complex disease.
Collapse
|
23
|
An Unbiased Estimate of Global Interrater Agreement. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2017; 77:721-742. [PMID: 29795928 PMCID: PMC5965630 DOI: 10.1177/0013164416654740] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Assessing global interrater agreement is difficult as most published indices are affected by the presence of mixtures of agreements and disagreements. A previously proposed method was shown to be specifically sensitive to global agreement, excluding mixtures, but also negatively biased. Here, we propose two alternatives in an attempt to find what makes such methods so specific. The first method, RB , is found to be unbiased while at the same time rejecting mixtures, is detecting agreement with good power and is little affected by unequal category prevalence as soon as there are more than two categories.
Collapse
|
24
|
Abstract
Statistical analyses are often conducted with α= .05. When multiple statistical tests are conducted, this procedure needs to be adjusted to compensate for the otherwise inflated Type I error. In some instances in tabletop gaming, sometimes it is desired to roll a 20-sided die (or 'd20') twice and take the greater outcome. Here I draw from probability theory and the case of a d20, where the probability of obtaining any specific outcome is
1/
20, to determine the probability of obtaining a specific outcome (Type-I error) at least once across repeated, independent statistical tests.
Collapse
|
25
|
DERIVATION OF A TEST STATISTIC FOR EMPHYSEMA QUANTIFICATION. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING 2016; 2016:1269-1273. [PMID: 27974952 PMCID: PMC5153356 DOI: 10.1109/isbi.2016.7493498] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Density masking is the de-facto quantitative imaging phenotype for emphysema that is widely used by the clinical community. Density masking defines the burden of emphysema by a fixed threshold, usually between -910 HU and -950 HU, that has been experimentally validated with histology. In this work, we formalized emphysema quantification by means of statistical inference. We show that a non-central Gamma is a good approximation for the local distribution of image intensities for normal and emphysema tissue. We then propose a test statistic in terms of the sample mean of a truncated non-central Gamma random variable. Our results show that this approach is well-suited for the detection of emphysema and superior to standard density masking. The statistical method was tested in a dataset of 1337 samples obtained from 9 different scanner models in subjects with COPD. Results showed an increase of 17% when compared to the density masking approach, and an overall accuracy of 94.09%.
Collapse
|
26
|
Prediction of protein essentiality by the support vector machine with statistical tests. Evol Bioinform Online 2013; 9:387-416. [PMID: 24250217 PMCID: PMC3795531 DOI: 10.4137/ebo.s11975] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Essential proteins include the minimum required set of proteins to support cell life. Identifying essential proteins is important for understanding the cellular processes of an organism. However, identifying essential proteins experimentally is extremely time-consuming and labor-intensive. Alternative methods must be developed to examine essential proteins. There were two goals in this study: identifying the important features and building learning machines for discriminating essential proteins. Data for Saccharomyces cerevisiae and Escherichia coli were used. We first collected information from a variety of sources. We next proposed a modified backward feature selection method and build support vector machines (SVM) predictors based on the selected features. To evaluate the performance, we conducted cross-validations for the originally imbalanced data set and the down-sampling balanced data set. The statistical tests were applied on the performance associated with obtained feature subsets to confirm their significance. In the first data set, our best values of F-measure and Matthews correlation coefficient (MCC) were 0.549 and 0.495 in the imbalanced experiments. For the balanced experiment, the best values of F-measure and MCC were 0.770 and 0.545, respectively. In the second data set, our best values of F-measure and MCC were 0.421 and 0.407 in the imbalanced experiments. For the balanced experiment, the best values of F-measure and MCC were 0.718 and 0.448, respectively. The experimental results show that our selected features are compact and the performance improved. Prediction can also be conducted by users at the following internet address: http://bio2.cse.nsysu.edu.tw/esspredict.aspx.
Collapse
|
27
|
A Tool Preference Choice Method for RNA Secondary Structure Prediction by SVM with Statistical Tests. Evol Bioinform Online 2013; 9:163-84. [PMID: 23641141 PMCID: PMC3629938 DOI: 10.4137/ebo.s10580] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
The Prediction of RNA secondary structures has drawn much attention from both biologists and computer scientists. Many useful tools have been developed for this purpose. These tools have their individual strengths and weaknesses. As a result, based on support vector machines (SVM), we propose a tool choice method which integrates three prediction tools: pknotsRG, RNAStructure, and NUPACK. Our method first extracts features from the target RNA sequence, and adopts two information-theoretic feature selection methods for feature ranking. We propose a method to combine feature selection and classifier fusion in an incremental manner. Our test data set contains 720 RNA sequences, where 225 pseudoknotted RNA sequences are obtained from PseudoBase, and 495 nested RNA sequences are obtained from RNA SSTRAND. The method serves as a preprocessing way in analyzing RNA sequences before the RNA secondary structure prediction tools are employed. In addition, the performance of various configurations is subject to statistical tests to examine their significance. The best base-pair accuracy achieved is 75.5%, which is obtained by the proposed incremental method, and is significantly higher than 68.8%, which is associated with the best predictor, pknotsRG.
Collapse
|
28
|
Abstract
Mammalian olfactory receptor families are segregated into different olfactory organs, with type 2 vomeronasal receptor (v2r) genes expressed in a basal layer of the vomeronasal epithelium. In contrast, teleost fish v2r genes are intermingled with all other olfactory receptor genes in a single sensory surface. We report here that, strikingly different from both lineages, the v2r gene family of the amphibian Xenopus laevis is expressed in the main olfactory as well as the vomeronasal epithelium. Interestingly, late diverging v2r genes are expressed exclusively in the vomeronasal epithelium, whereas "ancestral" v2r genes, including the single member of v2r family C, are restricted to the main olfactory epithelium. Moreover, within the main olfactory epithelium, v2r genes are expressed in a basal zone, partially overlapping, but clearly distinct from an apical zone of olfactory marker protein and odorant receptor-expressing cells. These zones are also apparent in the spatial distribution of odor responses, enabling a tentative assignment of odor responses to olfactory receptor gene families. Responses to alcohols, aldehydes, and ketones show an apical localization, consistent with being mediated by odorant receptors, whereas amino acid responses overlap extensively with the basal v2r-expressing zone. The unique bimodal v2r expression pattern in main and accessory olfactory system of amphibians presents an excellent opportunity to study the transition of v2r gene expression during evolution of higher vertebrates.
Collapse
|
29
|
Study designs and statistical analyses for biomarker research. SENSORS 2012; 12:8966-86. [PMID: 23012528 PMCID: PMC3444086 DOI: 10.3390/s120708966] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Revised: 06/21/2012] [Accepted: 06/21/2012] [Indexed: 01/19/2023]
Abstract
Biomarkers are becoming increasingly important for streamlining drug discovery and development. In addition, biomarkers are widely expected to be used as a tool for disease diagnosis, personalized medication, and surrogate endpoints in clinical research. In this paper, we highlight several important aspects related to study design and statistical analysis for clinical research incorporating biomarkers. We describe the typical and current study designs for exploring, detecting, and utilizing biomarkers. Furthermore, we introduce statistical issues such as confounding and multiplicity for statistical tests in biomarker research.
Collapse
|
30
|
Generalized Mantel-Haenszel procedures for 2 x J tables. ENVIRONMENTAL HEALTH PERSPECTIVES 1994; 102 Suppl 8:57-60. [PMID: 7851333 PMCID: PMC1566553 DOI: 10.1289/ehp.94102s857] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Generalization of Mantel-Haenszel procedure for 2 x J (J > 2) tables is reviewed. Included are generalized Mantel-Haenszel tests, estimators for a common odds ratio, and generalized Breslow-Day test for the homogeneity of odds ratios across the strata.-Environ Health Perspect 102(Suppl 8): 57-60 (1994)
Collapse
|