1
|
Gonzalez-Jimenez D, Del-Olmo J, Poza J, Garramiola F, Madina P. Data-Driven Low-Frequency Oscillation Event Detection Strategy for Railway Electrification Networks. Sensors (Basel) 2022; 23:254. [PMID: 36616852 PMCID: PMC9824671 DOI: 10.3390/s23010254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 12/20/2022] [Accepted: 12/23/2022] [Indexed: 06/17/2023]
Abstract
Low-frequency oscillations (LFO) occur in railway electrification systems due to the incorporation of new trains with switching converters. As a result, the increased harmonic content can cause catenary stability problems under certain conditions. Most of the research published on this topic to date is focused on modelling the event and analysing it using frequency spectrums. However, in recent years, due to the new technologies linked to Big Data (BD) and data mining (DM), a new opportunity to study and detect LFO events by means of machine-learning (ML) methods has emerged. Trains continuously collect data from the most important catenary variables, which offers new resources for analysing this type of event. Therefore, this article presents the design and implementation of a data-driven LFO event detection strategy for AC railway network scenarios. Compared to previous investigations, a new approach to analyse and detect LFO events, based on field data and ML, is presented. To obtain the most appropriate detection approach for the context of this application, on the one hand, this investigation includes a comparison of machine-learning algorithms (support vector machine, logistic regression, random forest, k-nearest neighbours, naïve Bayes) which have been trained with real field data. On the other hand, an analysis of key parameters and features to optimize event detection is also included. Thus, the most significant result of this work is the high metric values of the solution, reaching values above 97% in accuracy and 93% in F-1 score with the random forest algorithm. In addition, the applicability and training of data-driven methods with real field data are demonstrated. This automatic detection strategy can help with speeding up and improving LFO detection tasks that used to be performed manually. Finally, it is worth mentioning that this research has been structured based on the CRISP-DM methodology, established as the de facto approach for industrial DM projects.
Collapse
|
2
|
Martins B, Ferreira D, Neto C, Abelha A, Machado J. Data Mining for Cardiovascular Disease Prediction. J Med Syst 2021; 45:6. [PMID: 33404894 DOI: 10.1007/s10916-020-01682-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 11/24/2020] [Indexed: 10/22/2022]
Abstract
Cardiovascular diseases (CVDs) aredisorders of the heart and blood vessels and are a major cause of disability and premature death worldwide. Individuals at higher risk of developing CVD must be noticed at an early stage to prevent premature deaths. Advances in the field of computational intelligence, together with the vast amount of data produced daily in clinical settings, have made it possible to create recognition systems capable of identifying hidden patterns and useful information. This paper focuses on the application of Data Mining Techniques (DMTs) to clinical data collected during the medical examination in an attempt to predict whether or not an individual has a CVD. To this end, the CRossIndustry Standard Process for Data Mining (CRISP-DM) methodology was followed, in which five classifiers were applied, namely DT, Optimized DT, RI, RF, and DL. The models were mainly developed using the RapidMiner software with the assist of the WEKA tool and were analyzed based on accuracy, precision, sensitivity, and specificity. The results obtained were considered promising on the basis of the research for effective means of diagnosing CVD, with the best model being Optimized DT, which achieved the highest values for all the evaluation metrics, 73.54%, 75.82%, 68.89%, 78.16% and 0.788 for accuracy, precision, sensitivity, specificity, and AUC, respectively.
Collapse
|
3
|
Komenda M, Bulhart V, Karolyi M, Jarkovský J, Mužík J, Májek O, Šnajdrová L, Růžičková P, Rážová J, Prymula R, Macková B, Březovský P, Marounek J, Černý V, Dušek L. Complex Reporting of the COVID-19 Epidemic in the Czech Republic: Use of an Interactive Web-Based App in Practice. J Med Internet Res 2020; 22:e19367. [PMID: 32412422 PMCID: PMC7254961 DOI: 10.2196/19367] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 05/14/2020] [Accepted: 05/14/2020] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND The beginning of the coronavirus disease (COVID-19) epidemic dates back to December 31, 2019, when the first cases were reported in the People's Republic of China. In the Czech Republic, the first three cases of infection with the novel coronavirus were confirmed on March 1, 2020. The joint effort of state authorities and researchers gave rise to a unique team, which combines methodical knowledge of real-world processes with the know-how needed for effective processing, analysis, and online visualization of data. OBJECTIVE Due to an urgent need for a tool that presents important reports based on valid data sources, a team of government experts and researchers focused on the design and development of a web app intended to provide a regularly updated overview of COVID-19 epidemiology in the Czech Republic to the general population. METHODS The cross-industry standard process for data mining model was chosen for the complex solution of analytical processing and visualization of data that provides validated information on the COVID-19 epidemic across the Czech Republic. Great emphasis was put on the understanding and a correct implementation of all six steps (business understanding, data understanding, data preparation, modelling, evaluation, and deployment) needed in the process, including the infrastructure of a nationwide information system; the methodological setting of communication channels between all involved stakeholders; and data collection, processing, analysis, validation, and visualization. RESULTS The web-based overview of the current spread of COVID-19 in the Czech Republic has been developed as an online platform providing a set of outputs in the form of tables, graphs, and maps intended for the general public. On March 12, 2020, the first version of the web portal, containing fourteen overviews divided into five topical sections, was released. The web portal's primary objective is to publish a well-arranged visualization and clear explanation of basic information consisting of the overall numbers of performed tests, confirmed cases of COVID-19, COVID-19-related deaths, the daily and cumulative overviews of people with a positive COVID-19 case, performed tests, location and country of infection of people with a positive COVID-19 case, hospitalizations of patients with COVID-19, and distribution of personal protective equipment. CONCLUSIONS The online interactive overview of the current spread of COVID-19 in the Czech Republic was launched on March 11, 2020, and has immediately become the primary communication channel employed by the health care sector to present the current situation regarding the COVID-19 epidemic. This complex reporting of the COVID-19 epidemic in the Czech Republic also shows an effective way to interconnect knowledge held by various specialists, such as regional and national methodology experts (who report positive cases of the disease on a daily basis), with knowledge held by developers of central registries, analysts, developers of web apps, and leaders in the health care sector.
Collapse
Affiliation(s)
- Martin Komenda
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Vojtěch Bulhart
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Matěj Karolyi
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Jiří Jarkovský
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Jan Mužík
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Ondřej Májek
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Lenka Šnajdrová
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Petra Růžičková
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Jarmila Rážová
- Ministry of Health of the Czech Republic, Prague, Czech Republic
| | - Roman Prymula
- Ministry of Health of the Czech Republic, Prague, Czech Republic
| | | | | | - Jan Marounek
- Ministry of Health of the Czech Republic, Prague, Czech Republic
| | - Vladimír Černý
- Department of Anesthesiology, Perioperative Medicine and Intensive Care, Masaryk Hospital, Ústí nad Labem, Czech Republic.,Jan Evangelista Purkyne University, Ústí nad Labem, Czech Republic
| | - Ladislav Dušek
- Institute of Biostatistics and Analyses, Faculty of Medicine, Masaryk University, Brno, Czech Republic.,Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| |
Collapse
|
4
|
Plotnikova V, Dumas M, Milani F. Adaptations of data mining methodologies: a systematic literature review. PeerJ Comput Sci 2020; 6:e267. [PMID: 33816918 PMCID: PMC7924527 DOI: 10.7717/peerj-cs.267] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 03/02/2020] [Indexed: 05/30/2023]
Abstract
The use of end-to-end data mining methodologies such as CRISP-DM, KDD process, and SEMMA has grown substantially over the past decade. However, little is known as to how these methodologies are used in practice. In particular, the question of whether data mining methodologies are used 'as-is' or adapted for specific purposes, has not been thoroughly investigated. This article addresses this gap via a systematic literature review focused on the context in which data mining methodologies are used and the adaptations they undergo. The literature review covers 207 peer-reviewed and 'grey' publications. We find that data mining methodologies are primarily applied 'as-is'. At the same time, we also identify various adaptations of data mining methodologies and we note that their number is growing rapidly. The dominant adaptations pattern is related to methodology adjustments at a granular level (modifications) followed by extensions of existing methodologies with additional elements. Further, we identify two recurrent purposes for adaptation: (1) adaptations to handle Big Data technologies, tools and environments (technological adaptations); and (2) adaptations for context-awareness and for integrating data mining solutions into business processes and IT systems (organizational adaptations). The study suggests that standard data mining methodologies do not pay sufficient attention to deployment issues, which play a prominent role when turning data mining models into software products that are integrated into the IT architectures and business processes of organizations. We conclude that refinements of existing methodologies aimed at combining data, technological, and organizational aspects, could help to mitigate these gaps.
Collapse
|
5
|
Wlosinska M, Nilsson AC, Hlebowicz J, Hauggaard A, Kjellin M, Fakhro M, Lindstedt S. The effect of aged garlic extract on the atherosclerotic process - a randomized double-blind placebo-controlled trial. BMC Complement Med Ther 2020; 20:132. [PMID: 32349742 PMCID: PMC7191741 DOI: 10.1186/s12906-020-02932-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Accepted: 04/22/2020] [Indexed: 02/08/2023] Open
Abstract
Background One of the most serious secondary manifestations of Cardiovascular Disease (CVD) is coronary atherosclerosis. This study aimed to evaluate whether aged garlic extract (AGE) can influence coronary artery calcification (CAC) and to predict the individual effect of AGE using a standard process for data mining (CRISP–DM). Method This was a single-center parallel randomized controlled study in a university hospital in Europe. Patients were randomized, in a double-blind manner, through a computer-generated randomization chart. Patients with a Framingham risk score ≥ 10 after CT scan (n = 104) were randomized to an intake of placebo or AGE (2400 mg daily) for 1 year. Main outcome measures were changes in CAC score and secondary outcome measures changes in blood pressure, fasting blood glucose, blood lipids and inflammatory biomarkers. Result 104 patients were randomized and 46 in the active group and 47 in the placebo group were analyzed. There was a significant (p < 0.05) change in CAC progression (OR: 2.95 [1.05–8.27]), blood glucose (OR: 3.1 [1.09–8.85]) and IL-6 (OR 2.56 [1.00–6.53]) in favor of the active group. There was also a significant (p = 0.027) decrease in systolic blood pressure in the AGE group, from a mean of 148 (SD: 19) mmHg at 0 months, to 140 (SD: 15) mmHg after 12 months. The AGE Algorithm, at a selected probability cut-off value of 0.5, the accuracy score for CAC progression was 80%, precision score of 79% and recall score 83%. The score for blood pressure was 74% (accuracy, precision and recall). There were no side-effects in either group. Conclusions AGE inhibits CAC progression, lowers IL–6, glucose levels and blood pressure in patients at increased risk of cardiovascular events in a European cohort. An algorithm was made and was used to predict with 80% precision which patient will have a significantly reduced CAC progression using AGE. The algorithm could also predict with a 74% precision which patient will have a significant blood pressure lowering effect pressure using AGE. Trial registration Clinical trials NCT03860350, retrospectively registered (1/32019).
Collapse
Affiliation(s)
- Martiné Wlosinska
- Department of Cardiothoracic Surgery and Transplantation, Clinical Sciences, Lund University, Skåne University Hospital, SE-221 85, Lund, Sweden
| | - Ann-Christin Nilsson
- Department of Cardiothoracic Surgery and Transplantation, Clinical Sciences, Lund University, Skåne University Hospital, SE-221 85, Lund, Sweden
| | - Joanna Hlebowicz
- Department of Cardiology, Clinical Sciences, Lund University, Skåne University Hospital, Lund, Sweden
| | - Anders Hauggaard
- Department of Radiology, Cardiac Imaging, Skåne Hospital Northwest, Helsingborg, Sweden
| | - Maria Kjellin
- Department of Radiology, Cardiac Imaging, Skåne Hospital Northwest, Helsingborg, Sweden
| | - Mohammed Fakhro
- Department of Cardiothoracic Surgery and Transplantation, Clinical Sciences, Lund University, Skåne University Hospital, SE-221 85, Lund, Sweden
| | - Sandra Lindstedt
- Department of Cardiothoracic Surgery and Transplantation, Clinical Sciences, Lund University, Skåne University Hospital, SE-221 85, Lund, Sweden.
| |
Collapse
|
6
|
Cruz-Bermúdez JL, Parejo C, Martínez-Ruíz F, Sánchez-González JC, Ramos Martín-Vegue A, Royuela A, Rodríguez-González A, Menasalvas-Ruiz E, Provencio M. Applying Data Science methods and tools to unveil healthcare use of lung cancer patients in a teaching hospital in Spain. Clin Transl Oncol 2019; 21:1472-81. [PMID: 30864021 DOI: 10.1007/s12094-019-02074-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Accepted: 02/25/2019] [Indexed: 10/27/2022]
Abstract
PURPOSE Our primary goal was to study the use of outpatient attendances by lung cancer patients in Hospital Universitario Puerta de Hierro Majadahonda (HUPHM), Spain, by leveraging our Electronic Patient Record (EPR) and structured clinical registry of lung cancer cases as well as assessing current Data Science methods and tools. METHODS/PATIENTS We applied the Cross-Industry Standard Process for Data Mining (CRISP-DM) to integrate and analyze activity data extracted from the EPR (9.3 million records) and clinical data of lung cancer patients from a previous registry that was curated into a new, structured database based on REDCap. We have described and quantified factors with an influence in outpatient care use from univariate and multivariate points of view (through Poisson and negative binomial regression). RESULTS Three cycles of CRISP-DM were performed resulting in a curated database of 522 lung cancer patients with 133 variables which generated 43,197 outpatient visits and tests, 1538 ER visits and 753 inpatient admissions. Stage and ECOG-PS at diagnosis and Charlson Comorbidity Index were major contributors to healthcare use. We also found that the patients' pattern of healthcare use (even before diagnosis), the existence of a history of cancer in first-grade relatives, smoking habits, or even age at diagnosis, could play a relevant role. CONCLUSIONS Integrating activity data from EPR and clinical structured data from lung cancer patients and applying CRISP-DM has allowed us to describe healthcare use in connection with clinical variables that could be used to plan resources and improve quality of care.
Collapse
|