Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Pezoulas VC, Grigoriadis GI, Gkois G, Tachos NS, Smole T, Bosnić Z, Pičulin M, Olivotto I, Barlocco F, Robnik-Šikonja M, Jakovljevic DG, Goules A, Tzioufas AG, Fotiadis DI. A computational pipeline for data augmentation towards the improvement of disease classification and risk stratification models: A case study in two clinical domains. Comput Biol Med 2021;134:104520. [PMID: 34118751 DOI: 10.1016/j.compbiomed.2021.104520] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 05/13/2021] [Accepted: 05/24/2021] [Indexed: 11/20/2022]

For:	Pezoulas VC, Grigoriadis GI, Gkois G, Tachos NS, Smole T, Bosnić Z, Pičulin M, Olivotto I, Barlocco F, Robnik-Šikonja M, Jakovljevic DG, Goules A, Tzioufas AG, Fotiadis DI. A computational pipeline for data augmentation towards the improvement of disease classification and risk stratification models: A case study in two clinical domains. Comput Biol Med 2021;134:104520. [PMID: 34118751 DOI: 10.1016/j.compbiomed.2021.104520] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 05/13/2021] [Accepted: 05/24/2021] [Indexed: 11/20/2022]

Number

Cited by Other Article(s)

Vallevik VB, Babic A, Marshall SE, Elvatun S, Brøgger HMB, Alagaratnam S, Edwin B, Veeraragavan NR, Befring AK, Nygård JF. Can I trust my fake data - A comprehensive quality assessment framework for synthetic tabular data in healthcare. Int J Med Inform 2024;185:105413. [PMID: 38493547 DOI: 10.1016/j.ijmedinf.2024.105413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/17/2024] [Accepted: 03/11/2024] [Indexed: 03/19/2024]

Abstract

BACKGROUND

Ensuring safe adoption of AI tools in healthcare hinges on access to sufficient data for training, testing and validation. Synthetic data has been suggested in response to privacy concerns and regulatory requirements and can be created by training a generator on real data to produce a dataset with similar statistical properties. Competing metrics with differing taxonomies for quality evaluation have been proposed, resulting in a complex landscape. Optimising quality entails balancing considerations that make the data fit for use, yet relevant dimensions are left out of existing frameworks.

METHOD

We performed a comprehensive literature review on the use of quality evaluation metrics on synthetic data within the scope of synthetic tabular healthcare data using deep generative methods. Based on this and the collective team experiences, we developed a conceptual framework for quality assurance. The applicability was benchmarked against a practical case from the Dutch National Cancer Registry.

CONCLUSION

We present a conceptual framework for quality assuranceof synthetic data for AI applications in healthcare that aligns diverging taxonomies, expands on common quality dimensions to include the dimensions of Fairness and Carbon footprint, and proposes stages necessary to support real-life applications. Building trust in synthetic data by increasing transparency and reducing the safety risk will accelerate the development and uptake of trustworthy AI tools for the benefit of patients.

DISCUSSION

Despite the growing emphasis on algorithmic fairness and carbon footprint, these metrics were scarce in the literature review. The overwhelming focus was on statistical similarity using distance metrics while sequential logic detection was scarce. A consensus-backed framework that includes all relevant quality dimensions can provide assurance for safe and responsible real-life applications of synthetic data. As the choice of appropriate metrics are highly context dependent, further research is needed on validation studies to guide metric choices and support the development of technical standards.

Collapse

Menegatti D, Giuseppi A, Delli Priscoli F, Pietrabissa A, Di Giorgio A, Baldisseri F, Mattioni M, Monaco S, Lanari L, Panfili M, Suraci V. CADUCEO: A Platform to Support Federated Healthcare Facilities through Artificial Intelligence. Healthcare (Basel) 2023;11:2199. [PMID: 37570439 PMCID: PMC10418332 DOI: 10.3390/healthcare11152199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 07/26/2023] [Accepted: 07/31/2023] [Indexed: 08/13/2023] Open

Pezoulas VC, Exarchos TP, Tachos NS, Goules A, Tzioufas AG, Fotiadis DI. Boosting the performance of MALT lymphoma classification in patients with primary Sjögren's Syndrome through data augmentation: a case study. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023;2023:1-4. [PMID: 38083761 DOI: 10.1109/embc40787.2023.10340802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]

Hameed MAB, Alamgir Z. Improving mortality prediction in Acute Pancreatitis by machine learning and data augmentation. Comput Biol Med 2022;150:106077. [PMID: 36137318 DOI: 10.1016/j.compbiomed.2022.106077] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 08/28/2022] [Accepted: 09/03/2022] [Indexed: 11/22/2022]

Abstract

Acute Pancreatitis (AP) is the inflammation of the pancreas that can be fatal or lead to further complications based on the severity of the attack. Early detection of AP disease can help save lives by providing utmost care, rigorous treatment, and better resources. In this era of data and technology, instead of relying on manual scoring systems, scientists are employing advanced machine learning and data mining models for the early detection of patients with high chances of mortality. The current work on AP mortality prediction is negligible, and the few studies that exist have many shortcomings and are impractical for clinical deployment. In this research work, we tried to overcome the existing issues. One main issue is the lack of high-quality public datasets for AP, which are crucial for effectively training ML models. The available datasets are small in size, have many missing values, and suffer from high class imbalance. We augmented three public datasets, MIMIC-III, MIMIC-IV, and eICU, to obtain a larger dataset, and experiments proved that augmented data trained classifiers better than original small datasets. Moreover, we employed emerging advanced techniques to handle underlying issues in data. The results showed that iterative imputer is best for filling missing values in AP data. It beats not only the basic techniques but also the Knn-based imputation. Class imbalance is first addressed using data downsampling; apparently, it gave decent results on small test sets. However, we conducted numerous experiments on large test sets to prove that downsampling in the case of AP produced misleading and poor results. Next, we applied various techniques to upsample data in two different class splits, a 50 to 50 and a 70 to 30 majority-minority class split. Four different tabular generative adversarial networks, CTGAN, TGAN, CopulaGAN, and CTAB, and a variational autoencoder, TVAE, were deployed for synthetic data generation. SMOTE was also utilized for data upsampling. The computational results showed that the Random Forest (RF) classifier outperformed all other classifiers on a 50 to 50 class split data generated by CTGAN, with 0.702 Fβ and 0.833 recall. Results produced by RF on the TVAE dataset were also comparable, with 0.698 Fβ. In the case of SMOTE-based upsampling, DNN performed best with a 0.671 Fβ score.

Collapse

Pezoulas VC, Tachos NS, Olivotto I, Barlocco F, Fotiadis DI. A "smart" Imputation Approach for Effective Quality Control Across Complex Clinical Data Structures. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022;2022:1049-1052. [PMID: 36086027 DOI: 10.1109/embc48229.2022.9871919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Baeza-Delgado C, Cerdá Alberich L, Carot-Sierra JM, Veiga-Canuto D, Martínez de Las Heras B, Raza B, Martí-Bonmatí L. A practical solution to estimate the sample size required for clinical prediction models generated from observational research on data. Eur Radiol Exp 2022;6:22. [PMID: 35641659 PMCID: PMC9156610 DOI: 10.1186/s41747-022-00276-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 04/12/2022] [Indexed: 12/23/2022] Open

Wu H, Liang Q, Zhang W, Zou Q, El-Latif Hesham A, Liu B. iLncDA-LTR: Identification of lncRNA-disease associations by learning to rank. Comput Biol Med 2022;146:105605. [PMID: 35594681 DOI: 10.1016/j.compbiomed.2022.105605] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 04/27/2022] [Accepted: 05/09/2022] [Indexed: 12/12/2022]

Guan Q, Chen Y, Wei Z, Heidari AA, Hu H, Yang XH, Zheng J, Zhou Q, Chen H, Chen F. Medical image augmentation for lesion detection using a texture-constrained multichannel progressive GAN. Comput Biol Med 2022;145:105444. [PMID: 35421795 DOI: 10.1016/j.compbiomed.2022.105444] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 12/31/2021] [Accepted: 03/20/2022] [Indexed: 12/18/2022]