1
|
A dataset for predicting cloud cover over Europe. Sci Data 2024; 11:245. [PMID: 38413601 PMCID: PMC10899574 DOI: 10.1038/s41597-024-03062-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 02/13/2024] [Indexed: 02/29/2024] Open
Abstract
Clouds are important factors when projecting future climate. Unfortunately, future cloud fractional cover (the portion of the sky covered by clouds) is associated with significant uncertainty, making climate projections difficult. In this paper, we present the European Cloud Cover dataset, which can be used to learn statistical relations between cloud cover and other environmental variables, to potentially improve future climate projections. The dataset was created using a novel technique called Area Weighting Regridding Scheme to map satellite observations to cloud fractional cover on the same grid as the other variables in the dataset. Baseline experiments using autoregressive models document that it is possible to use the dataset to predict cloud fractional cover.
Collapse
|
2
|
Semantic representation and comparative analysis of physical activity sensor observations using MOX2-5 sensor in real and synthetic datasets: a proof-of-concept-study. Sci Rep 2024; 14:4634. [PMID: 38409365 PMCID: PMC10897381 DOI: 10.1038/s41598-024-55183-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 02/21/2024] [Indexed: 02/28/2024] Open
Abstract
The widespread use of devices like mobile phones and wearables allows for automatic monitoring of human daily activities, generating vast datasets that offer insights into long-term human behavior. A structured and controlled data collection process is essential to unlock the full potential of this information. While wearable sensors for physical activity monitoring have gained significant traction in healthcare, sports science, and fitness applications, securing diverse and comprehensive datasets for research and algorithm development poses a notable challenge. In this proof-of-concept study, we underscore the significance of semantic representation in enhancing data interoperability and facilitating advanced analytics for physical activity sensor observations. Our approach focuses on enhancing the usability of physical activity datasets by employing a medical-grade (CE certified) sensor to generate synthetic datasets. Additionally, we provide insights into ethical considerations related to synthetic datasets. The study conducts a comparative analysis between real and synthetic activity datasets, assessing their effectiveness in mitigating model bias and promoting fairness in predictive analysis. We have created an ontology for semantically representing observations from physical activity sensors and conducted predictive analysis on data collected using MOX2-5 activity sensors. Until now, there has been a lack of publicly available datasets for physical activity collected with MOX2-5 activity monitoring medical grade (CE certified) device. The MOX2-5 captures and transmits high-resolution data, including activity intensity, weight-bearing, sedentary, standing, low, moderate, and vigorous physical activity, as well as steps per minute. Our dataset consists of physical activity data collected from 16 adults (Male: 12; Female: 4) over a period of 30-45 days (approximately 1.5 months), yielding a relatively small volume of 539 records. To address this limitation, we employ various synthetic data generation methods, such as Gaussian Capula (GC), Conditional Tabular General Adversarial Network (CTGAN), and Tabular General Adversarial Network (TABGAN), to augment the dataset with synthetic data. For both the authentic and synthetic datasets, we have developed a Multilayer Perceptron (MLP) classification model for accurately classifying daily physical activity levels. The findings underscore the effectiveness of semantic ontology in semantic search, knowledge representation, data integration, reasoning, and capturing meaningful relationships between data. The analysis supports the hypothesis that the efficiency of predictive models improves as the volume of additional synthetic training data increases. Ontology and Generative AI hold the potential to expedite advancements in behavioral monitoring research. The data presented, encompassing both real MOX2-5 and its synthetic counterpart, serves as a valuable resource for developing robust methods in activity type classification. Furthermore, it opens avenues for exploration into research directions related to synthetic data, including model efficiency, detection of generated data, and considerations regarding data privacy.
Collapse
|
3
|
Understanding metric-related pitfalls in image analysis validation. Nat Methods 2024; 21:182-194. [PMID: 38347140 DOI: 10.1038/s41592-023-02150-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 12/12/2023] [Indexed: 02/15/2024]
Abstract
Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.
Collapse
|
4
|
Metrics reloaded: recommendations for image analysis validation. Nat Methods 2024; 21:195-212. [PMID: 38347141 DOI: 10.1038/s41592-023-02151-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 12/12/2023] [Indexed: 02/15/2024]
Abstract
Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.
Collapse
|
5
|
Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge. Sci Rep 2024; 14:2032. [PMID: 38263232 PMCID: PMC10805888 DOI: 10.1038/s41598-024-52063-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 01/12/2024] [Indexed: 01/25/2024] Open
Abstract
Polyps are well-known cancer precursors identified by colonoscopy. However, variability in their size, appearance, and location makes the detection of polyps challenging. Moreover, colonoscopy surveillance and removal of polyps are highly operator-dependent procedures and occur in a highly complex organ topology. There exists a high missed detection rate and incomplete removal of colonic polyps. To assist in clinical procedures and reduce missed rates, automated methods for detecting and segmenting polyps using machine learning have been achieved in past years. However, the major drawback in most of these methods is their ability to generalise to out-of-sample unseen datasets from different centres, populations, modalities, and acquisition systems. To test this hypothesis rigorously, we, together with expert gastroenterologists, curated a multi-centre and multi-population dataset acquired from six different colonoscopy systems and challenged the computational expert teams to develop robust automated detection and segmentation methods in a crowd-sourcing Endoscopic computer vision challenge. This work put forward rigorous generalisability tests and assesses the usability of devised deep learning methods in dynamic and actual clinical colonoscopy procedures. We analyse the results of four top performing teams for the detection task and five top performing teams for the segmentation task. Our analyses demonstrate that the top-ranking teams concentrated mainly on accuracy over the real-time performance required for clinical applicability. We further dissect the devised methods and provide an experiment-based hypothesis that reveals the need for improved generalisability to tackle diversity present in multi-centre datasets and routine clinical procedures.
Collapse
|
6
|
Livestreaming Technology and Online Child Sexual Exploitation and Abuse: A Scoping Review. TRAUMA, VIOLENCE & ABUSE 2024; 25:260-274. [PMID: 36727734 PMCID: PMC10666494 DOI: 10.1177/15248380221147564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Livestreaming of child sexual abuse (LSCSA) is an established form of online child sexual exploitation and abuse (OCSEA). However, only a limited body of research has examined this issue. The Covid-19 pandemic has accelerated internet use and user knowledge of livestreaming services emphasizing the importance of understanding this crime. In this scoping review, existing literature was brought together through an iterative search of eight databases containing peer-reviewed journal articles, as well as grey literature. Records were eligible for inclusion if the primary focus was on livestream technology and OCSEA, the child being defined as eighteen years or younger. Fourteen of the 2,218 records were selected. The data were charted and divided into four categories: victims, offenders, legislation, and technology. Limited research, differences in terminology, study design, and population inclusion criteria present a challenge to drawing general conclusions on the current state of LSCSA. The records show that victims are predominantly female. The average livestream offender was found to be older than the average online child sexual abuse offender. Therefore, it is unclear whether the findings are representative of the global population of livestream offenders. Furthermore, there appears to be a gap in what the records show on platforms and payment services used and current digital trends. The lack of a legal definition and privacy considerations pose a challenge to investigation, detection, and prosecution. The available data allow some insights into a potentially much larger issue.
Collapse
|
7
|
Using machine learning model explanations to identify proteins related to severity of meibomian gland dysfunction. Sci Rep 2023; 13:22946. [PMID: 38135766 PMCID: PMC10746717 DOI: 10.1038/s41598-023-50342-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/19/2023] [Indexed: 12/24/2023] Open
Abstract
Meibomian gland dysfunction is the most common cause of dry eye disease and leads to significantly reduced quality of life and social burdens. Because meibomian gland dysfunction results in impaired function of the tear film lipid layer, studying the expression of tear proteins might increase the understanding of the etiology of the condition. Machine learning is able to detect patterns in complex data. This study applied machine learning to classify levels of meibomian gland dysfunction from tear proteins. The aim was to investigate proteomic changes between groups with different severity levels of meibomian gland dysfunction, as opposed to only separating patients with and without this condition. An established feature importance method was used to identify the most important proteins for the resulting models. Moreover, a new method that can take the uncertainty of the models into account when creating explanations was proposed. By examining the identified proteins, potential biomarkers for meibomian gland dysfunction were discovered. The overall findings are largely confirmatory, indicating that the presented machine learning approaches are promising for detecting clinically relevant proteins. While this study provides valuable insights into proteomic changes associated with varying severity levels of meibomian gland dysfunction, it should be noted that it was conducted without a healthy control group. Future research could benefit from including such a comparison to further validate and extend the findings presented here.
Collapse
|
8
|
CELLULAR, A Cell Autophagy Imaging Dataset. Sci Data 2023; 10:806. [PMID: 37973836 PMCID: PMC10654672 DOI: 10.1038/s41597-023-02687-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 10/23/2023] [Indexed: 11/19/2023] Open
Abstract
Cells in living organisms are dynamic compartments that continuously respond to changes in their environment to maintain physiological homeostasis. While basal autophagy exists in cells to aid in the regular turnover of intracellular material, autophagy is also a critical cellular response to stress, such as nutritional depletion. Conversely, the deregulation of autophagy is linked to several diseases, such as cancer, and hence, autophagy constitutes a potential therapeutic target. Image analysis to follow autophagy in cells, especially on high-content screens, has proven to be a bottleneck. Machine learning (ML) algorithms have recently emerged as crucial in analyzing images to efficiently extract information, thus contributing to a better understanding of the questions at hand. This paper presents CELLULAR, an open dataset consisting of images of cells expressing the autophagy reporter mRFP-EGFP-Atg8a with cell-specific segmentation masks. Each cell is annotated into either basal autophagy, activated autophagy, or unknown. Furthermore, we introduce some preliminary experiments using the dataset that can be used as a baseline for future research.
Collapse
|
9
|
A Deep Diagnostic Framework Using Explainable Artificial Intelligence and Clustering. Diagnostics (Basel) 2023; 13:3413. [PMID: 37998548 PMCID: PMC10670034 DOI: 10.3390/diagnostics13223413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/03/2023] [Accepted: 11/06/2023] [Indexed: 11/25/2023] Open
Abstract
An important part of diagnostics is to gain insight into properties that characterize a disease. Machine learning has been used for this purpose, for instance, to identify biomarkers in genomics. However, when patient data are presented as images, identifying properties that characterize a disease becomes far more challenging. A common strategy involves extracting features from the images and analyzing their occurrence in healthy versus pathological images. A limitation of this approach is that the ability to gain new insights into the disease from the data is constrained by the information in the extracted features. Typically, these features are manually extracted by humans, which further limits the potential for new insights. To overcome these limitations, in this paper, we propose a novel framework that provides insights into diseases without relying on handcrafted features or human intervention. Our framework is based on deep learning (DL), explainable artificial intelligence (XAI), and clustering. DL is employed to learn deep patterns, enabling efficient differentiation between healthy and pathological images. Explainable artificial intelligence (XAI) visualizes these patterns, and a novel "explanation-weighted" clustering technique is introduced to gain an overview of these patterns across multiple patients. We applied the method to images from the gastrointestinal tract. In addition to real healthy images and real images of polyps, some of the images had synthetic shapes added to represent other types of pathologies than polyps. The results show that our proposed method was capable of organizing the images based on the reasons they were diagnosed as pathological, achieving high cluster quality and a rand index close to or equal to one.
Collapse
|
10
|
FANet: A Feedback Attention Network for Improved Biomedical Image Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9375-9388. [PMID: 35333723 DOI: 10.1109/tnnls.2022.3159394] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The increase of available large clinical and experimental datasets has contributed to a substantial amount of important contributions in the area of biomedical image analysis. Image segmentation, which is crucial for any quantitative analysis, has especially attracted attention. Recent hardware advancement has led to the success of deep learning approaches. However, although deep learning models are being trained on large datasets, existing methods do not use the information from different learning epochs effectively. In this work, we leverage the information of each training epoch to prune the prediction maps of the subsequent epochs. We propose a novel architecture called feedback attention network (FANet) that unifies the previous epoch mask with the feature map of the current training epoch. The previous epoch mask is then used to provide hard attention to the learned feature maps at different convolutional layers. The network also allows rectifying the predictions in an iterative fashion during the test time. We show that our proposed feedback attention model provides a substantial improvement on most segmentation metrics tested on seven publicly available biomedical imaging datasets demonstrating the effectiveness of FANet. The source code is available at https://github.com/nikhilroxtomar/FANet.
Collapse
|
11
|
A systematic review and knowledge mapping on ICT-based remote and automatic COVID-19 patient monitoring and care. BMC Health Serv Res 2023; 23:1047. [PMID: 37777722 PMCID: PMC10543863 DOI: 10.1186/s12913-023-10047-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Accepted: 09/20/2023] [Indexed: 10/02/2023] Open
Abstract
BACKGROUND e-Health has played a crucial role during the COVID-19 pandemic in primary health care. e-Health is the cost-effective and secure use of Information and Communication Technologies (ICTs) to support health and health-related fields. Various stakeholders worldwide use ICTs, including individuals, non-profit organizations, health practitioners, and governments. As a result of the COVID-19 pandemic, ICT has improved the quality of healthcare, the exchange of information, training of healthcare professionals and patients, and facilitated the relationship between patients and healthcare providers. This study systematically reviews the literature on ICT-based automatic and remote monitoring methods, as well as different ICT techniques used in the care of COVID-19-infected patients. OBJECTIVE The purpose of this systematic literature review is to identify the e-Health methods, associated ICTs, method implementation strategies, information collection techniques, advantages, and disadvantages of remote and automatic patient monitoring and care in COVID-19 pandemic. METHODS The search included primary studies that were published between January 2020 and June 2022 in scientific and electronic databases, such as EBSCOhost, Scopus, ACM, Nature, SpringerLink, IEEE Xplore, MEDLINE, Google Scholar, JMIR, Web of Science, Science Direct, and PubMed. In this review, the findings from the included publications are presented and elaborated according to the identified research questions. Evidence-based systematic reviews and meta-analyses were conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. Additionally, we improved the review process using the Rayyan tool and the Scale for the Assessment of Narrative Review Articles (SANRA). Among the eligibility criteria were methodological rigor, conceptual clarity, and useful implementation of ICTs in e-Health for remote and automatic monitoring of COVID-19 patients. RESULTS Our initial search identified 664 potential studies; 102 were assessed for eligibility in the pre-final stage and 65 articles were used in the final review with the inclusion and exclusion criteria. The review identified the following eHealth methods-Telemedicine, Mobile Health (mHealth), and Telehealth. The associated ICTs are Wearable Body Sensors, Artificial Intelligence (AI) algorithms, Internet-of-Things, or Internet-of-Medical-Things (IoT or IoMT), Biometric Monitoring Technologies (BioMeTs), and Bluetooth-enabled (BLE) home health monitoring devices. Spatial or positional data, personal and individual health, and wellness data, including vital signs, symptoms, biomedical images and signals, and lifestyle data are examples of information that is managed by ICTs. Different AI and IoT methods have opened new possibilities for automatic and remote patient monitoring with associated advantages and weaknesses. Our findings were represented in a structured manner using a semantic knowledge graph (e.g., ontology model). CONCLUSIONS Various e-Health methods, related remote monitoring technologies, different approaches, information categories, the adoption of ICT tools for an automatic remote patient monitoring (RPM), advantages and limitations of RMTs in the COVID-19 case are discussed in this review. The use of e-Health during the COVID-19 pandemic illustrates the constraints and possibilities of using ICTs. ICTs are not merely an external tool to achieve definite remote and automatic health monitoring goals; instead, they are embedded in contexts. Therefore, the importance of the mutual design process between ICT and society during the global health crisis has been observed from a social informatics perspective. A global health crisis can be observed as an information crisis (e.g., insufficient information, unreliable information, and inaccessible information); however, this review shows the influence of ICTs on COVID-19 patients' health monitoring and related information collection techniques.
Collapse
|
12
|
Sperm motility assessed by deep convolutional neural networks into WHO categories. Sci Rep 2023; 13:14777. [PMID: 37679484 PMCID: PMC10484948 DOI: 10.1038/s41598-023-41871-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 09/01/2023] [Indexed: 09/09/2023] Open
Abstract
Semen analysis is central in infertility investigation. Manual assessment of sperm motility according to the WHO recommendations is the golden standard, and extensive training is a requirement for accurate and reproducible results. Deep convolutional neural networks (DCNN) are especially suitable for image classification. In this study, we evaluated the performance of the DCNN ResNet-50 in predicting the proportion of sperm in the WHO motility categories. Two models were evaluated using tenfold cross-validation with 65 video recordings of wet semen preparations from an external quality assessment programme for semen analysis. The corresponding manually assessed data was obtained from several of the reference laboratories, and the mean values were used for training of the DCNN models. One model was trained to predict the three categories progressive motility, non-progressive motility, and immotile spermatozoa. Another model was used in predicting four categories, where progressive motility was differentiated into rapid and slow. The resulting average mean absolute error (MAE) was 0.05 and 0.07, and the average ZeroR baseline was 0.09 and 0.10 for the three-category and the four-category model, respectively. Manual and DCNN-predicted motility was compared by Pearson's correlation coefficient and by difference plots. The strongest correlation between the mean manually assessed values and DCNN-predicted motility was observed for % progressively motile spermatozoa (Pearson's r = 0.88, p < 0.001) and % immotile spermatozoa (r = 0.89, p < 0.001). For rapid progressive motility, the correlation was moderate (Pearson's r = 0.673, p < 0.001). The median difference between manual and predicted progressive motility was 0 and 2 for immotile spermatozoa. The largest bias was observed at high and low percentages of progressive and immotile spermatozoa. The DCNN-predicted value was within the range of the interlaboratory variation of the results for most of the samples. In conclusion, DCNN models were able to predict the proportion of spermatozoa into the WHO motility categories with significantly lower error than the baseline. The best correlation between the manual and the DCNN-predicted motility values was found for the categories progressive and immotile. Of note, there was considerable variation between the mean motility values obtained for each category by the reference laboratories, especially for rapid progressive motility, which impacts the training of the DCNN models.
Collapse
|
13
|
A field assessment of child abuse investigators' engagement with a child-avatar to develop interviewing skills. CHILD ABUSE & NEGLECT 2023; 143:106324. [PMID: 37390589 DOI: 10.1016/j.chiabu.2023.106324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 06/05/2023] [Accepted: 06/19/2023] [Indexed: 07/02/2023]
Abstract
BACKGROUND Child investigative interviewing is a complex skill requiring specialised training. A critical training element is practice. Simulations with digital avatars are cost-effective options for delivering training. This study of real-world data provides novel insights evaluating a large number of trainees' engagement with LiveSimulation (LiveSim), an online child-avatar that involves a trainee selecting a question (i.e., an option-tree) and the avatar responding with the level of detail appropriate for the question type. While LiveSim has been shown to facilitate learning of open-ended questions, its utility (from a user engagement perspective) remains to be examined. OBJECTIVE We evaluated trainees' engagement with LiveSim, focusing on patterns of interaction (e.g., amount), appropriateness of the prompt structure, and the programme's technical compatibility. PARTICIPANTS AND SETTING Professionals (N = 606, mainly child protection workers and police) being offered the avatar as part of an intensive course on how to interview a child conducted between 2009 and 2018. METHODS For descriptive analysis, Visual Basic for Applications coding in Excel was applied to evaluate engagement and internal attributes of LiveSim. A compatibility study of the programme was run testing different hardware focusing on access and function. RESULTS The trainees demonstrated good engagement with the programme across a variety of measures, including number and timing of activity completions. Overall, knowing the utility of avatars, our results provide strong support for the notion that a technically simple avatar like LiveSim awake user engagement. This is important knowledge in further development of learning simulations using next-generation technology.
Collapse
|
14
|
Enhancing questioning skills through child avatar chatbot training with feedback. Front Psychol 2023; 14:1198235. [PMID: 37519386 PMCID: PMC10374201 DOI: 10.3389/fpsyg.2023.1198235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 06/21/2023] [Indexed: 08/01/2023] Open
Abstract
Training child investigative interviewing skills is a specialized task. Those being trained need opportunities to practice their skills in realistic settings and receive immediate feedback. A key step in ensuring the availability of such opportunities is to develop a dynamic, conversational avatar, using artificial intelligence (AI) technology that can provide implicit and explicit feedback to trainees. In the iterative process, use of a chatbot avatar to test the language and conversation model is crucial. The model is fine-tuned with interview data and realistic scenarios. This study used a pre-post training design to assess the learning effects on questioning skills across four child interview sessions that involved training with a child avatar chatbot fine-tuned with interview data and realistic scenarios. Thirty university students from the areas of child welfare, social work, and psychology were divided into two groups; one group received direct feedback (n = 12), whereas the other received no feedback (n = 18). An automatic coding function in the language model identified the question types. Information on question types was provided as feedback in the direct feedback group only. The scenario included a 6-year-old girl being interviewed about alleged physical abuse. After the first interview session (baseline), all participants watched a video lecture on memory, witness psychology, and questioning before they conducted two additional interview sessions and completed a post-experience survey. One week later, they conducted a fourth interview and completed another post-experience survey. All chatbot transcripts were coded for interview quality. The language model's automatic feedback function was found to be highly reliable in classifying question types, reflecting the substantial agreement among the raters [Cohen's kappa (κ) = 0.80] in coding open-ended, cued recall, and closed questions. Participants who received direct feedback showed a significantly higher improvement in open-ended questioning than those in the non-feedback group, with a significant increase in the number of open-ended questions used between the baseline and each of the other three chat sessions. This study demonstrates that child avatar chatbot training improves interview quality with regard to recommended questioning, especially when combined with direct feedback on questioning.
Collapse
|
15
|
Usefulness of Heat Map Explanations for Deep-Learning-Based Electrocardiogram Analysis. Diagnostics (Basel) 2023; 13:2345. [PMID: 37510089 PMCID: PMC10378376 DOI: 10.3390/diagnostics13142345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 07/06/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open
Abstract
Deep neural networks are complex machine learning models that have shown promising results in analyzing high-dimensional data such as those collected from medical examinations. Such models have the potential to provide fast and accurate medical diagnoses. However, the high complexity makes deep neural networks and their predictions difficult to understand. Providing model explanations can be a way of increasing the understanding of "black box" models and building trust. In this work, we applied transfer learning to develop a deep neural network to predict sex from electrocardiograms. Using the visual explanation method Grad-CAM, heat maps were generated from the model in order to understand how it makes predictions. To evaluate the usefulness of the heat maps and determine if the heat maps identified electrocardiogram features that could be recognized to discriminate sex, medical doctors provided feedback. Based on the feedback, we concluded that, in our setting, this mode of explainable artificial intelligence does not provide meaningful information to medical doctors and is not useful in the clinic. Our results indicate that improved explanation techniques that are tailored to medical data should be developed before deep neural networks can be applied in the clinic for diagnostic purposes.
Collapse
|
16
|
VISEM-Tracking, a human spermatozoa tracking dataset. Sci Data 2023; 10:260. [PMID: 37156762 PMCID: PMC10167330 DOI: 10.1038/s41597-023-02173-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Accepted: 04/20/2023] [Indexed: 05/10/2023] Open
Abstract
A manual assessment of sperm motility requires microscopy observation, which is challenging due to the fast-moving spermatozoa in the field of view. To obtain correct results, manual evaluation requires extensive training. Therefore, computer-aided sperm analysis (CASA) has become increasingly used in clinics. Despite this, more data is needed to train supervised machine learning approaches in order to improve accuracy and reliability in the assessment of sperm motility and kinematics. In this regard, we provide a dataset called VISEM-Tracking with 20 video recordings of 30 seconds (comprising 29,196 frames) of wet semen preparations with manually annotated bounding-box coordinates and a set of sperm characteristics analyzed by experts in the domain. In addition to the annotated data, we provide unlabeled video clips for easy-to-use access and analysis of the data via methods such as self- or unsupervised learning. As part of this paper, we present baseline sperm detection performances using the YOLOv5 deep learning (DL) model trained on the VISEM-Tracking dataset. As a result, we show that the dataset can be used to train complex DL models to analyze spermatozoa.
Collapse
|
17
|
GridHTM: Grid-Based Hierarchical Temporal Memory for Anomaly Detection in Videos. SENSORS (BASEL, SWITZERLAND) 2023; 23:2087. [PMID: 36850686 PMCID: PMC9961912 DOI: 10.3390/s23042087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/25/2023] [Accepted: 02/07/2023] [Indexed: 06/18/2023]
Abstract
The interest in video anomaly detection systems that can detect different types of anomalies, such as violent behaviours in surveillance videos, has gained traction in recent years. The current approaches employ deep learning to perform anomaly detection in videos, but this approach has multiple problems. For example, deep learning in general has issues with noise, concept drift, explainability, and training data volumes. Additionally, anomaly detection in itself is a complex task and faces challenges such as unknownness, heterogeneity, and class imbalance. Anomaly detection using deep learning is therefore mainly constrained to generative models such as generative adversarial networks and autoencoders due to their unsupervised nature; however, even they suffer from general deep learning issues and are hard to properly train. In this paper, we explore the capabilities of the Hierarchical Temporal Memory (HTM) algorithm to perform anomaly detection in videos, as it has favorable properties such as noise tolerance and online learning which combats concept drift. We introduce a novel version of HTM, named GridHTM, which is a grid-based HTM architecture specifically for anomaly detection in complex videos such as surveillance footage. We have tested GridHTM using the VIRAT video surveillance dataset, and the subsequent evaluation results and online learning capabilities prove the great potential of using our system for real-time unsupervised anomaly detection in complex videos.
Collapse
|
18
|
Predicting an unstable tear film through artificial intelligence. Sci Rep 2022; 12:21416. [PMID: 36496510 PMCID: PMC9741582 DOI: 10.1038/s41598-022-25821-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022] Open
Abstract
Dry eye disease is one of the most common ophthalmological complaints and is defined by a loss of tear film homeostasis. Establishing a diagnosis can be time-consuming, resource demanding and unpleasant for the patient. In this pilot study, we retrospectively included clinical data from 431 patients with dry eye disease examined in the Norwegian Dry Eye Clinic to evaluate how artificial intelligence algorithms perform on clinical data related to dry eye disease. The data was processed and subjected to numerous machine learning classification algorithms with the aim to predict decreased tear film break-up time. Moreover, feature selection techniques (information gain and information gain ratio) were applied to determine which clinical factors contribute most to an unstable tear film. The applied machine learning algorithms outperformed baseline classifications performed with ZeroR according to included evaluation metrics. Clinical features such as ocular surface staining, meibomian gland expressibility and dropout, blink frequency, osmolarity, meibum quality and symptom score were recognized as important predictors for tear film instability. We identify and discuss potential limitations and pitfalls.
Collapse
|
19
|
Towards the Neuroevolution of Low-level artificial general intelligence. Front Robot AI 2022; 9:1007547. [PMID: 36313249 PMCID: PMC9613950 DOI: 10.3389/frobt.2022.1007547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 10/03/2022] [Indexed: 11/13/2022] Open
Abstract
In this work, we argue that the search for Artificial General Intelligence should start from a much lower level than human-level intelligence. The circumstances of intelligent behavior in nature resulted from an organism interacting with its surrounding environment, which could change over time and exert pressure on the organism to allow for learning of new behaviors or environment models. Our hypothesis is that learning occurs through interpreting sensory feedback when an agent acts in an environment. For that to happen, a body and a reactive environment are needed. We evaluate a method to evolve a biologically-inspired artificial neural network that learns from environment reactions named Neuroevolution of Artificial General Intelligence, a framework for low-level artificial general intelligence. This method allows the evolutionary complexification of a randomly-initialized spiking neural network with adaptive synapses, which controls agents instantiated in mutable environments. Such a configuration allows us to benchmark the adaptivity and generality of the controllers. The chosen tasks in the mutable environments are food foraging, emulation of logic gates, and cart-pole balancing. The three tasks are successfully solved with rather small network topologies and therefore it opens up the possibility of experimenting with more complex tasks and scenarios where curriculum learning is beneficial.
Collapse
|
20
|
Poster Session 1Baseline filtering alleviates generalization issues for neural networks for electrocardiogram analysis. J Electrocardiol 2022. [DOI: 10.1016/j.jelectrocard.2022.07.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
21
|
Video Analytics in Elite Soccer: A Distributed Computing Perspective. PROCEEDINGS OF THE ... IEEE SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP : SAM. IEEE SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP 2022; 2022:221-225. [PMID: 36818954 PMCID: PMC9931798 DOI: 10.1109/sam53842.2022.9827827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Ubiquitous sensors and Internet of Things (IoT) technologies have revolutionized the sports industry, providing new methodologies for planning, effective coordination of training, and match analysis post game. New methods, including machine learning, image and video processing, have been developed for performance evaluation, allowing the analyst to track the performance of a player in real-time. Following FIFA's 2015 approval of electronics performance and tracking system during games, performance data of a single player or the entire team is allowed to be collected using GPS-based wearables. Data from practice sessions outside the sporting arena is being collected in greater numbers than ever before. Realizing the significance of data in professional soccer, this paper presents video analytics, examines recent state-of-the-art literature in elite soccer, and summarizes existing real-time video analytics algorithms. We also discuss real-time crowdsourcing of the obtained data, tactical and technical performance, distributed computing and its importance in video analytics and propose a future research perspective.
Collapse
|
22
|
SinGAN-Seg: Synthetic training data generation for medical image segmentation. PLoS One 2022; 17:e0267976. [PMID: 35500005 PMCID: PMC9060378 DOI: 10.1371/journal.pone.0267976] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 04/19/2022] [Indexed: 12/20/2022] Open
Abstract
Analyzing medical data to find abnormalities is a time-consuming and costly task, particularly for rare abnormalities, requiring tremendous efforts from medical experts. Therefore, artificial intelligence has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. However, the machine learning models used to build these tools are highly dependent on the data used to train them. Large amounts of data can be difficult to obtain in medicine due to privacy reasons, expensive and time-consuming annotations, and a general lack of data samples for infrequent lesions. In this study, we present a novel synthetic data generation pipeline, called SinGAN-Seg, to produce synthetic medical images with corresponding masks using a single training image. Our method is different from the traditional generative adversarial networks (GANs) because our model needs only a single image and the corresponding ground truth to train. We also show that the synthetic data generation pipeline can be used to produce alternative artificial segmentation datasets with corresponding ground truth masks when real datasets are not allowed to share. The pipeline is evaluated using qualitative and quantitative comparisons between real data and synthetic data to show that the style transfer technique used in our pipeline significantly improves the quality of the generated data and our method is better than other state-of-the-art GANs to prepare synthetic images when the size of training datasets are limited. By training UNet++ using both real data and the synthetic data generated from the SinGAN-Seg pipeline, we show that the models trained on synthetic data have very close performances to those trained on real data when both datasets have a considerable amount of training data. In contrast, we show that synthetic data generated from the SinGAN-Seg pipeline improves the performance of segmentation models when training datasets do not have a considerable amount of data. All experiments were performed using an open dataset and the code is publicly available on GitHub.
Collapse
|
23
|
Efficient quantile tracking using an oracle. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03489-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
AbstractConcept drift is a well-known issue that arises when working with data streams. In this paper, we present a procedure that allows a quantile tracking procedure to cope with concept drift. We suggest using expected quantile loss, a popular loss function in quantile regression, to monitor the quantile tracking error, which, in turn, is used to efficiently adapt to concept drift. The suggested procedures adapt efficiently to concept drift, and the tracking performance is close to theoretically optimal. The procedures were further applied to three real-life streaming data sets related to Twitter event detection, activity recognition, and stock trading. The results show that the procedures are efficient at adapting to concept drift, thereby documenting the real-world applicability of the procedures. We further used asymptotic theory from statistics to show the appealing theoretical property that, if the data stream distribution is stationary over time, the procedures converge to the true quantile.
Collapse
|
24
|
On evaluation metrics for medical applications of artificial intelligence. Sci Rep 2022; 12:5979. [PMID: 35395867 PMCID: PMC8993826 DOI: 10.1038/s41598-022-09954-8] [Citation(s) in RCA: 73] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 03/30/2022] [Indexed: 12/18/2022] Open
Abstract
Clinicians and software developers need to understand how proposed machine learning (ML) models could improve patient care. No single metric captures all the desirable properties of a model, which is why several metrics are typically reported to summarize a model’s performance. Unfortunately, these measures are not easily understandable by many clinicians. Moreover, comparison of models across studies in an objective manner is challenging, and no tool exists to compare models using the same performance metrics. This paper looks at previous ML studies done in gastroenterology, provides an explanation of what different metrics mean in the context of binary classification in the presented studies, and gives a thorough explanation of how different metrics should be interpreted. We also release an open source web-based tool that may be used to aid in calculating the most relevant metrics presented in this paper so that other researchers and clinicians may easily incorporate them into their research.
Collapse
|
25
|
Meta-learning with implicit gradients in a few-shot setting for medical image segmentation. Comput Biol Med 2022; 143:105227. [PMID: 35124439 DOI: 10.1016/j.compbiomed.2022.105227] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 01/05/2022] [Accepted: 01/05/2022] [Indexed: 12/26/2022]
Abstract
Widely used traditional supervised deep learning methods require a large number of training samples but often fail to generalize on unseen datasets. Therefore, a more general application of any trained model is quite limited for medical imaging for clinical practice. Using separately trained models for each unique lesion category or a unique patient population will require sufficiently large curated datasets, which is not practical to use in a real-world clinical set-up. Few-shot learning approaches can not only minimize the need for an enormous number of reliable ground truth labels that are labour-intensive and expensive, but can also be used to model on a dataset coming from a new population. To this end, we propose to exploit an optimization-based implicit model agnostic meta-learning (iMAML) algorithm under few-shot settings for medical image segmentation. Our approach can leverage the learned weights from diverse but small training samples to perform analysis on unseen datasets with high accuracy. We show that, unlike classical few-shot learning approaches, our method improves generalization capability. To our knowledge, this is the first work that exploits iMAML for medical image segmentation and explores the strength of the model on scenarios such as meta-training on unique and mixed instances of lesion datasets. Our quantitative results on publicly available skin and polyp datasets show that the proposed method outperforms the naive supervised baseline model and two recent few-shot segmentation approaches by large margins. In addition, our iMAML approach shows an improvement of 2%-4% in dice score compared to its counterpart MAML for most experiments.
Collapse
|
26
|
Abstract
During the past decades, many automated image analysis methods have been developed for colonoscopy. Real-time implementation of the most promising methods during colonoscopy has been tested in clinical trials, including several recent multi-center studies. All trials have shown results that may contribute to prevention of colorectal cancer. We summarize the past and present development of colonoscopy video analysis methods, focusing on two categories of artificial intelligence (AI) technologies used in clinical trials. These are (1) analysis and feedback for improving colonoscopy quality and (2) detection of abnormalities. Our survey includes methods that use traditional machine learning algorithms on carefully designed hand-crafted features as well as recent deep-learning methods. Lastly, we present the gap between current state-of-the-art technology and desirable clinical features and conclude with future directions of endoscopic AI technology development that will bridge the current gap.
Collapse
|
27
|
Artificial Intelligence in Gastroenterology. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
28
|
Artificial intelligence in dry eye disease. Ocul Surf 2021; 23:74-86. [PMID: 34843999 DOI: 10.1016/j.jtos.2021.11.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 11/08/2021] [Accepted: 11/09/2021] [Indexed: 12/21/2022]
Abstract
Dry eye disease (DED) has a prevalence of between 5 and 50%, depending on the diagnostic criteria used and population under study. However, it remains one of the most underdiagnosed and undertreated conditions in ophthalmology. Many tests used in the diagnosis of DED rely on an experienced observer for image interpretation, which may be considered subjective and result in variation in diagnosis. Since artificial intelligence (AI) systems are capable of advanced problem solving, use of such techniques could lead to more objective diagnosis. Although the term 'AI' is commonly used, recent success in its applications to medicine is mainly due to advancements in the sub-field of machine learning, which has been used to automatically classify images and predict medical outcomes. Powerful machine learning techniques have been harnessed to understand nuances in patient data and medical images, aiming for consistent diagnosis and stratification of disease severity. This is the first literature review on the use of AI in DED. We provide a brief introduction to AI, report its current use in DED research and its potential for application in the clinic. Our review found that AI has been employed in a wide range of DED clinical tests and research applications, primarily for interpretation of interferometry, slit-lamp and meibography images. While initial results are promising, much work is still needed on model development, clinical testing and standardisation.
Collapse
|
29
|
Impact of Image Resolution on Deep Learning Performance in Endoscopy Image Classification: An Experimental Study Using a Large Dataset of Endoscopic Images. Diagnostics (Basel) 2021; 11:diagnostics11122183. [PMID: 34943421 PMCID: PMC8700246 DOI: 10.3390/diagnostics11122183] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 11/18/2021] [Accepted: 11/20/2021] [Indexed: 01/05/2023] Open
Abstract
Recent trials have evaluated the efficacy of deep convolutional neural network (CNN)-based AI systems to improve lesion detection and characterization in endoscopy. Impressive results are achieved, but many medical studies use a very small image resolution to save computing resources at the cost of losing details. Today, no conventions between resolution and performance exist, and monitoring the performance of various CNN architectures as a function of image resolution provides insights into how subtleties of different lesions on endoscopy affect performance. This can help set standards for image or video characteristics for future CNN-based models in gastrointestinal (GI) endoscopy. This study examines the performance of CNNs on the HyperKvasir dataset, consisting of 10,662 images from 23 different findings. We evaluate two CNN models for endoscopic image classification under quality distortions with image resolutions ranging from 32 × 32 to 512 × 512 pixels. The performance is evaluated using two-fold cross-validation and F1-score, maximum Matthews correlation coefficient (MCC), precision, and sensitivity as metrics. Increased performance was observed with higher image resolution for all findings in the dataset. MCC was achieved at image resolutions between 512 × 512 pixels for classification for the entire dataset after including all subclasses. The highest performance was observed with an MCC value of 0.9002 when the models were trained on the highest resolution and tested on the same resolution. Different resolutions and their effect on CNNs are explored. We show that image resolution has a clear influence on the performance which calls for standards in the field in the future.
Collapse
|
30
|
DeepFake electrocardiograms using generative adversarial networks are the beginning of the end for privacy issues in medicine. Sci Rep 2021; 11:21896. [PMID: 34753975 PMCID: PMC8578227 DOI: 10.1038/s41598-021-01295-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 10/26/2021] [Indexed: 11/09/2022] Open
Abstract
Recent global developments underscore the prominent role big data have in modern medical science. But privacy issues constitute a prevalent problem for collecting and sharing data between researchers. However, synthetic data generated to represent real data carrying similar information and distribution may alleviate the privacy issue. In this study, we present generative adversarial networks (GANs) capable of generating realistic synthetic DeepFake 10-s 12-lead electrocardiograms (ECGs). We have developed and compared two methods, named WaveGAN* and Pulse2Pulse. We trained the GANs with 7,233 real normal ECGs to produce 121,977 DeepFake normal ECGs. By verifying the ECGs using a commercial ECG interpretation program (MUSE 12SL, GE Healthcare), we demonstrate that the Pulse2Pulse GAN was superior to the WaveGAN* to produce realistic ECGs. ECG intervals and amplitudes were similar between the DeepFake and real ECGs. Although these synthetic ECGs mimic the dataset used for creation, the ECGs are not linked to any individuals and may thus be used freely. The synthetic dataset will be available as open access for researchers at OSF.io and the DeepFake generator available at the Python Package Index (PyPI) for generating synthetic ECGs. In conclusion, we were able to generate realistic synthetic ECGs using generative adversarial neural networks on normal ECGs from two population studies, thereby addressing the relevant privacy issues in medical datasets.
Collapse
|
31
|
P–104 Assessment of sperm motility according to WHO classification using convolutional neural networks. Hum Reprod 2021. [DOI: 10.1093/humrep/deab130.103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Abstract
Study question
How does convolutional neural network (CNN)-predicted sperm motility correlate with manual assessment according to the WHO guidelines.
Summary answer
CNN predicts sperm motility comparable to reference laboratories in the ESHRE-SIGA External Quality Assessment Programme for Semen Analysis.
What is known already
Manual sperm motility assessment according to WHO guidelines is regarded as the gold standard. To obtain reliable and reproducible results, comprehensive training is essential as well as running internal and external quality control. Prediction based on artificial intelligence can potentially transfer human-level performance into models that perform the task faster and can avoid human assessor variations. CNNs have been groundbreaking in image processing. To develop AI models with high predictive power, the data set used should be of high quality and sperm motility assessment based on WHO guidelines.
Study design, size, duration
Videos of 65 fresh semen samples obtained from the ESHRE-SIGA External Quality Assessment Programme for Semen Analysis (from the period 2006–2018) were used in the development of the model. One video was captured for each semen sample. Sperm motility data was obtained from manual assessment of the videos according to WHO criteria by reference laboratories in the programme. Rapid progressive motility was also included. Ten-fold cross-validation was used to compensate for the relatively small dataset.
Participants/materials, setting, methods
The mean values of the reference laboratories were used. Sparse optical flow of the sperm videos was generated from each second of each video and fed into a ResNet50 convolutional neural network. For training, Adam was used to optimize the weights and mean squared error (MSE) to measure loss. For baseline, ZeroR (pseudo regression) was performed. Results are reported as MAE. For correlation analysis, Pearson’s r was used.
Main results and the role of chance
Predicting sperm motility based on the optical flow generated from the videos, achieved an average MAE of 0.05 across progressive (0.06), non-progressive (0.04) and immotile sperm (0.05). The ZeroR baseline was 0.09, indicating that the method is able to capture the movement of the spermatozoa and predict motility with low error. Pearson’s correlation between manually and AI-predicted motility showed r of 0.88, p < 0.001 for progressive, 0.59, p < 0.001 for non-progressive and 0.89, p < 0.001 for immotile sperm. When predicting rapid progressive motility, the average MAE was 0.07 across rapid progressive (0.11), slow progressive (0.09), non-progressive (0.04) and immotile sperm (0.05). Pearson’s correlation analysis between manually and AI-predicted motility showed r of 0.67, p < 0.001 for rapid progressive, 0.41, p < 0.001 for slow progressive, 0.51, p < 0.001 for non-progressive and 0.88, p < 0.001 for immotile sperm. The results show that differentiating between rapid progressive and slow progressive motility is difficult, but the model is still able to do this better than the ZeroR baseline, which was 0.15 for rapid progressive and 0.11 for slow progressive. This is interesting since rapid progressive motility has been regarded challenging to assess. The next step would be to compare the results of the algorithm to the human performance.
Limitations, reasons for caution
The sample size is small. The model is based on videos of high quality, and the performance may not transfer well to videos of lower quality. The performance for rapid progressive motility, which may have an important clinical value, has to be improved.
Wider implications of the findings: This CNN model has a potential to assess sperm motility according to WHO guidelines for progressive motility and immotility. The error values for the automatic predictions are low, and the model shows a good performance taking into account that only videos were used to perform the prediction.
Trial registration number
Not applicable
Collapse
|
32
|
P–029 Identification of spermatozoa by unsupervised learning from video data. Hum Reprod 2021. [DOI: 10.1093/humrep/deab130.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
Study question
Can artificial intelligence (AI) algorithms identify spermatozoa in a semen sample without using training data annotated by professionals?
Summary answer
Unsupervised AI methods can discriminate the spermatozoon from other cells and debris. These unsupervised methods may have a potential for several applications in reproductive medicine.
What is known already
Identification of individual sperm is essential to assess a given sperm sample’s motility behaviour. Existing computer-aided systems need training data based on annotations by professionals, which is resource demanding. On the other hand, data analysed by unsupervised machine learning algorithms can improve supervised algorithms that are more stable for clinical applications. Therefore, unsupervised sperm identification can improve computer-aided sperm analysis systems predicting different aspects of sperm samples. Other possible applications are assessing kinematics and counting of spermatozoa.
Study design, size, duration
Three sperm-like paint images were manipulated using a graphic design tool and used to train our AI system. Two paintings have an ash colour background and randomly distributed white colour circles, and one painting has a predefined pattern of circles. Selected semen sample videos from a public dataset with videos obtained from 85 participants were used to test our AI system.
Participants/materials, setting, methods
Generative adversarial networks (GANs) have become common AI methods to process data in an unsupervised way. Based on single image frames extracted from videos, a GAN (SinGAN) can be trained to determine and track locations of sperms by translating the real images into localization paintings. The resulting model showed the potential of identifying the presence of sperms without any prior knowledge about data.
Main results and the role of chance
Visual comparisons of localization paintings to real sperm images show that inverse training of SinGANs can track sperms. Converting colour frames into grayscale frames and using grayscale synthetic sperm-like frames showed the best visual quality of generated localization paintings of sperm frames. Feeding real sperm video frames to the SinGAN at different scaling factors, which is defining the resolution of the input image, showed different quality levels of generated sperm localization paintings. A sperm frame given to the algorithm with a scaling factor of one leads to random sperm tracking, while the scales two to four result in more accurate localization maps than scaling levels five to eight. In contrast, scales from six to eight result in an output close to the input frame. The proposed method is robust in terms of the number of spermatozoa, meaning that the detection works well for samples with a low or high sperm count. For visual comparisons, visit our Github page: https://vlbthambawita.github.io/singan-sperm/. The sperm tracking speed of our SinGAN using an NVIDIA 1080 graphic processing unit, is around 17 frames per second, which can be improved by using parallel video processing capabilities. This shows the capability of using this method for real-time analysis.
Limitations, reasons for caution
Unsupervised methods are hard to train, and the results need human verification. The proposed method will need quality control and must be standardized. Unsupervised sperm tracking SinGAN may identify blurry bright spots as non-existing sperm heads which may restrict the use of SinGAN sperm tracking for sperm counting.
Wider implications of the findings: Assessment of semen samples according to the WHO guidelines is subjective and resource-demanding. This unsupervised model might be used to develop new systems for less time-consuming and more accurate evaluation of semen samples. It may also be used for real-time analysis of prepared spermatozoa for use in assisted reproduction technology.
Trial registration number
N/A
Collapse
|
33
|
Artificial intelligence in the fertility clinic: status, pitfalls and possibilities. Hum Reprod 2021; 36:2429-2442. [PMID: 34324672 DOI: 10.1093/humrep/deab168] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 06/21/2021] [Indexed: 12/15/2022] Open
Abstract
In recent years, the amount of data produced in the field of ART has increased exponentially. The diversity of data is large, ranging from videos to tabular data. At the same time, artificial intelligence (AI) is progressively used in medical practice and may become a promising tool to improve success rates with ART. AI models may compensate for the lack of objectivity in several critical procedures in fertility clinics, especially embryo and sperm assessments. Various models have been developed, and even though several of them show promising performance, there are still many challenges to overcome. In this review, we present recent research on AI in the context of ART. We discuss the strengths and weaknesses of the presented methods, especially regarding clinical relevance. We also address the pitfalls hampering successful use of AI in the clinic and discuss future possibilities and important aspects to make AI truly useful for ART.
Collapse
|
34
|
Using 3D Convolutional Neural Networks for Real-time Detection of Soccer Events. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2021. [DOI: 10.1142/s1793351x2140002x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Developing systems for the automatic detection of events in video is a task which has gained attention in many areas including sports. More specifically, event detection for soccer videos has been studied widely in the literature. However, there are still a number of shortcomings in the state-of-the-art such as high latency, making it challenging to operate at the live edge. In this paper, we present an algorithm to detect events in soccer videos in real time, using 3D convolutional neural networks. We test our algorithm on three different datasets from SoccerNet, the Swedish Allsvenskan, and the Norwegian Eliteserien. Overall, the results show that we can detect events with high recall, low latency, and accurate time estimation. The trade-off is a slightly lower precision compared to the current state-of-the-art, which has higher latency and performs better when a less accurate time estimation can be accepted. In addition to the presented algorithm, we perform an extensive ablation study on how the different parts of the training pipeline affect the final results.
Collapse
|
35
|
A Comprehensive Study on Colorectal Polyp Segmentation With ResUNet++, Conditional Random Field and Test-Time Augmentation. IEEE J Biomed Health Inform 2021; 25:2029-2040. [PMID: 33400658 DOI: 10.1109/jbhi.2021.3049304] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Colonoscopy is considered the gold standard for detection of colorectal cancer and its precursors. Existing examination methods are, however, hampered by high overall miss-rate, and many abnormalities are left undetected. Computer-Aided Diagnosis systems based on advanced machine learning algorithms are touted as a game-changer that can identify regions in the colon overlooked by the physicians during endoscopic examinations, and help detect and characterize lesions. In previous work, we have proposed the ResUNet++ architecture and demonstrated that it produces more efficient results compared with its counterparts U-Net and ResUNet. In this paper, we demonstrate that further improvements to the overall prediction performance of the ResUNet++ architecture can be achieved by using Conditional Random Field (CRF) and Test-Time Augmentation (TTA). We have performed extensive evaluations and validated the improvements using six publicly available datasets: Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS-Larib Polyp DB, ASU-Mayo Clinic Colonoscopy Video Database, and CVC-VideoClinicDB. Moreover, we compare our proposed architecture and resulting model with other state-of-the-art methods. To explore the generalization capability of ResUNet++ on different publicly available polyp datasets, so that it could be used in a real-world setting, we performed an extensive cross-dataset evaluation. The experimental results show that applying CRF and TTA improves the performance on various polyp segmentation datasets both on the same dataset and cross-dataset. To check the model's performance on difficult to detect polyps, we selected, with the help of an expert gastroenterologist, 196 sessile or flat polyps that are less than ten millimeters in size. This additional data has been made available as a subset of Kvasir-SEG. Our approaches showed good results for flat or sessile and smaller polyps, which are known to be one of the major reasons for high polyp miss-rates. This is one of the significant strengths of our work and indicates that our methods should be investigated further for use in clinical practice.
Collapse
|
36
|
Abstract
Artificial intelligence (AI) is predicted to have profound effects on the future of video capsule endoscopy (VCE) technology. The potential lies in improving anomaly detection while reducing manual labour. Existing work demonstrates the promising benefits of AI-based computer-assisted diagnosis systems for VCE. They also show great potential for improvements to achieve even better results. Also, medical data is often sparse and unavailable to the research community, and qualified medical personnel rarely have time for the tedious labelling work. We present Kvasir-Capsule, a large VCE dataset collected from examinations at a Norwegian Hospital. Kvasir-Capsule consists of 117 videos which can be used to extract a total of 4,741,504 image frames. We have labelled and medically verified 47,238 frames with a bounding box around findings from 14 different classes. In addition to these labelled images, there are 4,694,266 unlabelled frames included in the dataset. The Kvasir-Capsule dataset can play a valuable role in developing better algorithms in order to reach true potential of VCE technology.
Collapse
|
37
|
Real-Time Polyp Detection, Localization and Segmentation in Colonoscopy Using Deep Learning. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2021; 9:40496-40510. [PMID: 33747684 PMCID: PMC7968127 DOI: 10.1109/access.2021.3063716] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 02/15/2021] [Indexed: 05/16/2023]
Abstract
Computer-aided detection, localisation, and segmentation methods can help improve colonoscopy procedures. Even though many methods have been built to tackle automatic detection and segmentation of polyps, benchmarking of state-of-the-art methods still remains an open problem. This is due to the increasing number of researched computer vision methods that can be applied to polyp datasets. Benchmarking of novel methods can provide a direction to the development of automated polyp detection and segmentation tasks. Furthermore, it ensures that the produced results in the community are reproducible and provide a fair comparison of developed methods. In this paper, we benchmark several recent state-of-the-art methods using Kvasir-SEG, an open-access dataset of colonoscopy images for polyp detection, localisation, and segmentation evaluating both method accuracy and speed. Whilst, most methods in literature have competitive performance over accuracy, we show that the proposed ColonSegNet achieved a better trade-off between an average precision of 0.8000 and mean IoU of 0.8100, and the fastest speed of 180 frames per second for the detection and localisation task. Likewise, the proposed ColonSegNet achieved a competitive dice coefficient of 0.8206 and the best average speed of 182.38 frames per second for the segmentation task. Our comprehensive comparison with various state-of-the-art methods reveals the importance of benchmarking the deep learning methods for automated real-time polyp identification and delineations that can potentially transform current clinical practices and minimise miss-detection rates.
Collapse
|
38
|
A comprehensive analysis of classification methods in gastrointestinal endoscopy imaging. Med Image Anal 2021; 70:102007. [PMID: 33740740 DOI: 10.1016/j.media.2021.102007] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 01/20/2021] [Accepted: 02/16/2021] [Indexed: 12/24/2022]
Abstract
Gastrointestinal (GI) endoscopy has been an active field of research motivated by the large number of highly lethal GI cancers. Early GI cancer precursors are often missed during the endoscopic surveillance. The high missed rate of such abnormalities during endoscopy is thus a critical bottleneck. Lack of attentiveness due to tiring procedures, and requirement of training are few contributing factors. An automatic GI disease classification system can help reduce such risks by flagging suspicious frames and lesions. GI endoscopy consists of several multi-organ surveillance, therefore, there is need to develop methods that can generalize to various endoscopic findings. In this realm, we present a comprehensive analysis of the Medico GI challenges: Medical Multimedia Task at MediaEval 2017, Medico Multimedia Task at MediaEval 2018, and BioMedia ACM MM Grand Challenge 2019. These challenges are initiative to set-up a benchmark for different computer vision methods applied to the multi-class endoscopic images and promote to build new approaches that could reliably be used in clinics. We report the performance of 21 participating teams over a period of three consecutive years and provide a detailed analysis of the methods used by the participants, highlighting the challenges and shortcomings of the current approaches and dissect their credibility for the use in clinical settings. Our analysis revealed that the participants achieved an improvement on maximum Mathew correlation coefficient (MCC) from 82.68% in 2017 to 93.98% in 2018 and 95.20% in 2019 challenges, and a significant increase in computational speed over consecutive years.
Collapse
|
39
|
Comparative validation of multi-instance instrument segmentation in endoscopy: Results of the ROBUST-MIS 2019 challenge. Med Image Anal 2020; 70:101920. [PMID: 33676097 DOI: 10.1016/j.media.2020.101920] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 09/22/2020] [Accepted: 11/24/2020] [Indexed: 12/27/2022]
Abstract
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. While numerous methods for detecting, segmenting and tracking of medical instruments based on endoscopic video images have been proposed in the literature, key limitations remain to be addressed: Firstly, robustness, that is, the reliable performance of state-of-the-art methods when run on challenging images (e.g. in the presence of blood, smoke or motion artifacts). Secondly, generalization; algorithms trained for a specific intervention in a specific hospital should generalize to other interventions or institutions. In an effort to promote solutions for these limitations, we organized the Robust Medical Instrument Segmentation (ROBUST-MIS) challenge as an international benchmarking competition with a specific focus on the robustness and generalization capabilities of algorithms. For the first time in the field of endoscopic image processing, our challenge included a task on binary segmentation and also addressed multi-instance detection and segmentation. The challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures from three different types of surgery. The validation of the competing methods for the three tasks (binary segmentation, multi-instance detection and multi-instance segmentation) was performed in three different stages with an increasing domain gap between the training and the test data. The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap. While the average detection and segmentation quality of the best-performing algorithms is high, future research should concentrate on detection and segmentation of small, crossing, moving and transparent instrument(s) (parts).
Collapse
|
40
|
HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci Data 2020; 7:283. [PMID: 32859981 PMCID: PMC7455694 DOI: 10.1038/s41597-020-00622-y] [Citation(s) in RCA: 89] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 07/21/2020] [Indexed: 02/08/2023] Open
Abstract
Artificial intelligence is currently a hot topic in medicine. However, medical data is often sparse and hard to obtain due to legal restrictions and lack of medical personnel for the cumbersome and tedious process to manually label training data. These constraints make it difficult to develop systems for automatic analysis, like detecting disease or other lesions. In this respect, this article presents HyperKvasir, the largest image and video dataset of the gastrointestinal tract available today. The data is collected during real gastro- and colonoscopy examinations at Bærum Hospital in Norway and partly labeled by experienced gastrointestinal endoscopists. The dataset contains 110,079 images and 374 videos, and represents anatomical landmarks as well as pathological and normal findings. The total number of images and video frames together is around 1 million. Initial experiments demonstrate the potential benefits of artificial intelligence-based computer-assisted diagnosis systems. The HyperKvasir dataset can play a valuable role in developing better algorithms and computer-assisted examination systems not only for gastro- and colonoscopy, but also for other fields in medicine.
Collapse
|
41
|
An Extensive Study on Cross-Dataset Bias and Evaluation Metrics Interpretation for Machine Learning Applied to Gastrointestinal Tract Abnormality Classification. ACTA ACUST UNITED AC 2020. [DOI: 10.1145/3386295] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Precise and efficient automated identification of gastrointestinal (GI) tract diseases can help doctors treat more patients and improve the rate of disease detection and identification. Currently, automatic analysis of diseases in the GI tract is a hot topic in both computer science and medical-related journals. Nevertheless, the evaluation of such an automatic analysis is often incomplete or simply wrong. Algorithms are often only tested on small and biased datasets, and cross-dataset evaluations are rarely performed. A clear understanding of evaluation metrics and machine learning models with cross datasets is crucial to bring research in the field to a new quality level. Toward this goal, we present comprehensive evaluations of five distinct machine learning models using global features and deep neural networks that can classify 16 different key types of GI tract conditions, including pathological findings, anatomical landmarks, polyp removal conditions, and normal findings from images captured by common GI tract examination instruments. In our evaluation, we introduce performance hexagons using six performance metrics, such as recall, precision, specificity, accuracy, F1-score, and the Matthews correlation coefficient to demonstrate how to determine the real capabilities of models rather than evaluating them shallowly. Furthermore, we perform cross-dataset evaluations using different datasets for training and testing. With these cross-dataset evaluations, we demonstrate the challenge of actually building a generalizable model that could be used across different hospitals. Our experiments clearly show that more sophisticated performance metrics and evaluation methods need to be applied to get reliable models rather than depending on evaluations of the splits of the same dataset—that is, the performance metrics should always be interpreted together rather than relying on a single metric.
Collapse
|
42
|
|