1
|
Crowdsourcing with the drift diffusion model of decision making. Sci Rep 2024; 14:11311. [PMID: 38760397 DOI: 10.1038/s41598-024-61687-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 05/08/2024] [Indexed: 05/19/2024] Open
Abstract
Crowdsourcing involves the use of annotated labels with unknown reliability to estimate ground truth labels in datasets. A common task in crowdsourcing involves estimating reliabilities of annotators (such as through the sensitivities and specificities of annotators in the binary label setting). In the literature, beta or dirichlet distributions are typically imposed as priors on annotator reliability. In this study, we investigated the use of a neuroscientifically validated model of decision making, known as the drift-diffusion model, as a prior on the annotator labeling process. Two experiments were conducted on synthetically generated data with non-linear (sinusoidal) decision boundaries. Variational inference was used to predict ground truth labels and annotator related parameters. Our method performed similarly to a state-of-the-art technique (SVGPCR) in prediction of crowdsourced data labels and prediction through a crowdsourced-generated Gaussian process classifier. By relying on a neuroscientifically validated model of decision making to model annotator behavior, our technique opens the avenue of predicting neuroscientific biomarkers of annotators, expanding the scope of what may be learnt about annotators in crowdsourcing tasks.
Collapse
|
2
|
A Perspective on Crowdsourcing and Human-in-the-Loop Workflows in Precision Health. J Med Internet Res 2024; 26:e51138. [PMID: 38602750 PMCID: PMC11046386 DOI: 10.2196/51138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 11/15/2023] [Accepted: 01/30/2024] [Indexed: 04/12/2024] Open
Abstract
Modern machine learning approaches have led to performant diagnostic models for a variety of health conditions. Several machine learning approaches, such as decision trees and deep neural networks, can, in principle, approximate any function. However, this power can be considered to be both a gift and a curse, as the propensity toward overfitting is magnified when the input data are heterogeneous and high dimensional and the output class is highly nonlinear. This issue can especially plague diagnostic systems that predict behavioral and psychiatric conditions that are diagnosed with subjective criteria. An emerging solution to this issue is crowdsourcing, where crowd workers are paid to annotate complex behavioral features in return for monetary compensation or a gamified experience. These labels can then be used to derive a diagnosis, either directly or by using the labels as inputs to a diagnostic machine learning model. This viewpoint describes existing work in this emerging field and discusses ongoing challenges and opportunities with crowd-powered diagnostic systems, a nascent field of study. With the correct considerations, the addition of crowdsourcing to human-in-the-loop machine learning workflows for the prediction of complex and nuanced health conditions can accelerate screening, diagnostics, and ultimately access to care.
Collapse
|
3
|
Digitally Diagnosing Multiple Developmental Delays Using Crowdsourcing Fused With Machine Learning: Protocol for a Human-in-the-Loop Machine Learning Study. JMIR Res Protoc 2024; 13:e52205. [PMID: 38329783 PMCID: PMC10884895 DOI: 10.2196/52205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 12/17/2023] [Accepted: 12/26/2023] [Indexed: 02/09/2024] Open
Abstract
BACKGROUND A considerable number of minors in the United States are diagnosed with developmental or psychiatric conditions, potentially influenced by underdiagnosis factors such as cost, distance, and clinician availability. Despite the potential of digital phenotyping tools with machine learning (ML) approaches to expedite diagnoses and enhance diagnostic services for pediatric psychiatric conditions, existing methods face limitations because they use a limited set of social features for prediction tasks and focus on a single binary prediction, resulting in uncertain accuracies. OBJECTIVE This study aims to propose the development of a gamified web system for data collection, followed by a fusion of novel crowdsourcing algorithms with ML behavioral feature extraction approaches to simultaneously predict diagnoses of autism spectrum disorder and attention-deficit/hyperactivity disorder in a precise and specific manner. METHODS The proposed pipeline will consist of (1) gamified web applications to curate videos of social interactions adaptively based on the needs of the diagnostic system, (2) behavioral feature extraction techniques consisting of automated ML methods and novel crowdsourcing algorithms, and (3) the development of ML models that classify several conditions simultaneously and that adaptively request additional information based on uncertainties about the data. RESULTS A preliminary version of the web interface has been implemented, and a prior feature selection method has highlighted a core set of behavioral features that can be targeted through the proposed gamified approach. CONCLUSIONS The prospect for high reward stems from the possibility of creating the first artificial intelligence-powered tool that can identify complex social behaviors well enough to distinguish conditions with nuanced differentiators such as autism spectrum disorder and attention-deficit/hyperactivity disorder. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) PRR1-10.2196/52205.
Collapse
|
4
|
A Review of and Roadmap for Data Science and Machine Learning for the Neuropsychiatric Phenotype of Autism. Annu Rev Biomed Data Sci 2023; 6:211-228. [PMID: 37137169 PMCID: PMC11093217 DOI: 10.1146/annurev-biodatasci-020722-125454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly.
Collapse
|
5
|
Training and Profiling a Pediatric Facial Expression Classifier for Children on Mobile Devices: Machine Learning Study. JMIR Form Res 2023; 7:e39917. [PMID: 35962462 PMCID: PMC10131663 DOI: 10.2196/39917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 08/01/2022] [Accepted: 08/09/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Implementing automated facial expression recognition on mobile devices could provide an accessible diagnostic and therapeutic tool for those who struggle to recognize facial expressions, including children with developmental behavioral conditions such as autism. Despite recent advances in facial expression classifiers for children, existing models are too computationally expensive for smartphone use. OBJECTIVE We explored several state-of-the-art facial expression classifiers designed for mobile devices, used posttraining optimization techniques for both classification performance and efficiency on a Motorola Moto G6 phone, evaluated the importance of training our classifiers on children versus adults, and evaluated the models' performance against different ethnic groups. METHODS We collected images from 12 public data sets and used video frames crowdsourced from the GuessWhat app to train our classifiers. All images were annotated for 7 expressions: neutral, fear, happiness, sadness, surprise, anger, and disgust. We tested 3 copies for each of 5 different convolutional neural network architectures: MobileNetV3-Small 1.0x, MobileNetV2 1.0x, EfficientNetB0, MobileNetV3-Large 1.0x, and NASNetMobile. We trained the first copy on images of children, second copy on images of adults, and third copy on all data sets. We evaluated each model against the entire Child Affective Facial Expression (CAFE) set and by ethnicity. We performed weight pruning, weight clustering, and quantize-aware training when possible and profiled each model's performance on the Moto G6. RESULTS Our best model, a MobileNetV3-Large network pretrained on ImageNet, achieved 65.78% accuracy and 65.31% F1-score on the CAFE and a 90-millisecond inference latency on a Moto G6 phone when trained on all data. This accuracy is only 1.12% lower than the current state of the art for CAFE, a model with 13.91x more parameters that was unable to run on the Moto G6 due to its size, even when fully optimized. When trained solely on children, this model achieved 60.57% accuracy and 60.29% F1-score. When trained only on adults, the model received 53.36% accuracy and 53.10% F1-score. Although the MobileNetV3-Large trained on all data sets achieved nearly a 60% F1-score across all ethnicities, the data sets for South Asian and African American children achieved lower accuracy (as much as 11.56%) and F1-score (as much as 11.25%) than other groups. CONCLUSIONS With specialized design and optimization techniques, facial expression classifiers can become lightweight enough to run on mobile devices and achieve state-of-the-art performance. There is potentially a "data shift" phenomenon between facial expressions of children compared with adults; our classifiers performed much better when trained on children. Certain underrepresented ethnic groups (e.g., South Asian and African American) also perform significantly worse than groups such as European Caucasian despite similar data quality. Our models can be integrated into mobile health therapies to help diagnose autism spectrum disorder and provide targeted therapeutic treatment to children.
Collapse
|
6
|
Digitally Diagnosing Multiple Developmental Delays using Crowdsourcing fused with Machine Learning: A Research Protocol. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023. [PMID: 36945467 PMCID: PMC10029023 DOI: 10.1101/2023.03.05.23286817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Background Roughly 17% percent of minors in the United States aged 3 through 17 years have a diagnosis of one or more developmental or psychiatric conditions, with the true prevalence likely being higher due to underdiagnosis in rural areas and for minority populations. Unfortunately, timely diagnostic services are inaccessible to a large portion of the United States and global population due to cost, distance, and clinician availability. Digital phenotyping tools have the potential to shorten the time-to-diagnosis and to bring diagnostic services to more people by enabling accessible evaluations. While automated machine learning (ML) approaches for detection of pediatric psychiatry conditions have garnered increased research attention in recent years, existing approaches use a limited set of social features for the prediction task and focus on a single binary prediction. Objective I propose the development of a gamified web system for data collection followed by a fusion of novel crowdsourcing algorithms with machine learning behavioral feature extraction approaches to simultaneously predict diagnoses of Autism Spectrum Disorder (ASD) and Attention-Deficit/Hyperactivity Disorder (ADHD) in a precise and specific manner. Methods The proposed pipeline will consist of: (1) a gamified web applications to curate videos of social interactions adaptively based on needs of the diagnostic system, (2) behavioral feature extraction techniques consisting of automated ML methods and novel crowdsourcing algorithms, and (3) development of ML models which classify several conditions simultaneously and which adaptively request additional information based on uncertainties about the data. Conclusions The prospective for high reward stems from the possibility of creating the first AI-powered tool which can identify complex social behaviors well enough to distinguish conditions with nuanced differentiators such as ASD and ADHD.
Collapse
|
7
|
Session Introduction: TOWARDS ETHICAL BIOMEDICAL INFORMATICS: LEARNING FROM OLELO NOEAU, HAWAIIAN PROVERBS. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2023; 28:461-471. [PMID: 36541000 PMCID: PMC11095408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Innovations in human-centered biomedical informatics are often developed with the eventual goal of real-world translation. While biomedical research questions are usually answered in terms of how a method performs in a particular context, we argue that it is equally important to consider and formally evaluate the ethical implications of informatics solutions. Several new research paradigms have arisen as a result of the consideration of ethical issues, including but not limited for privacy-preserving computation and fair machine learning. In the spirit of the Pacific Symposium on Biocomputing, we discuss broad and fundamental principles of ethical biomedical informatics in terms of Olelo Noeau, or Hawaiian proverbs and poetical sayings that capture Hawaiian values. While we emphasize issues related to privacy and fairness in particular, there are a multitude of facets to ethical biomedical informatics that can benefit from a critical analysis grounded in ethics.
Collapse
|
8
|
Early childhood tracking application: Correspondence between crowd-based developmental percentiles and clinical tools. Health Informatics J 2023; 29:14604582231164695. [PMID: 36914414 DOI: 10.1177/14604582231164695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Barriers to child developmental screening lead to delayed diagnosis and intervention. babyTRACKS, a mobile application for tracking developmental milestones, presents parents their child's percentiles computed relative to crowd-based data. This study evaluated correspondence between crowd-based percentiles and traditional development measures. Research analyzed babyTRACKS diaries of 1951 children. Parents recorded attainment age for milestones across Gross Motor, Fine Motor, Language, Cognitive, and Social domains. Fifty-seven parents completed the Ages and Stages Questionnaire (ASQ-3), and 13 families participated in the Mullen Scales of Early Learning (MSEL) expert assessment. Crowd-based percentiles were compared with: Centers for Disease Control (CDC) norms for comparable milestones, ASQ-3 and MSEL scores. babyTRACKS percentiles correlated with the percentage of unmet CDC milestones, and with higher ASQ-3 and MSEL scores across several domains. Children who did not meet CDC age thresholds had lower babyTRACKS percentiles by about 20 points and those at ASQ-3 risk had lower babyTRACKS Fine Motor and Language scores. Repeated measures tests showed significantly higher MSEL versus babyTRACKS percentiles in the Language domain. Although ages and milestones in diary varied, the app percentiles corresponded with traditional measures, particularly in fine motor and language domains. Future research is needed for determining referral thresholds while minimizing false alarms.
Collapse
|
9
|
The Classification of Abnormal Hand Movement to Aid in Autism Detection: Machine Learning Study. JMIR BIOMEDICAL ENGINEERING 2022. [DOI: 10.2196/33771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Background
A formal autism diagnosis can be an inefficient and lengthy process. Families may wait several months or longer before receiving a diagnosis for their child despite evidence that earlier intervention leads to better treatment outcomes. Digital technologies that detect the presence of behaviors related to autism can scale access to pediatric diagnoses. A strong indicator of the presence of autism is self-stimulatory behaviors such as hand flapping.
Objective
This study aims to demonstrate the feasibility of deep learning technologies for the detection of hand flapping from unstructured home videos as a first step toward validation of whether statistical models coupled with digital technologies can be leveraged to aid in the automatic behavioral analysis of autism. To support the widespread sharing of such home videos, we explored privacy-preserving modifications to the input space via conversion of each video to hand landmark coordinates and measured the performance of corresponding time series classifiers.
Methods
We used the Self-Stimulatory Behavior Dataset (SSBD) that contains 75 videos of hand flapping, head banging, and spinning exhibited by children. From this data set, we extracted 100 hand flapping videos and 100 control videos, each between 2 to 5 seconds in duration. We evaluated five separate feature representations: four privacy-preserved subsets of hand landmarks detected by MediaPipe and one feature representation obtained from the output of the penultimate layer of a MobileNetV2 model fine-tuned on the SSBD. We fed these feature vectors into a long short-term memory network that predicted the presence of hand flapping in each video clip.
Results
The highest-performing model used MobileNetV2 to extract features and achieved a test F1 score of 84 (SD 3.7; precision 89.6, SD 4.3 and recall 80.4, SD 6) using 5-fold cross-validation for 100 random seeds on the SSBD data (500 total distinct folds). Of the models we trained on privacy-preserved data, the model trained with all hand landmarks reached an F1 score of 66.6 (SD 3.35). Another such model trained with a select 6 landmarks reached an F1 score of 68.3 (SD 3.6). A privacy-preserved model trained using a single landmark at the base of the hands and a model trained with the average of the locations of all the hand landmarks reached an F1 score of 64.9 (SD 6.5) and 64.2 (SD 6.8), respectively.
Conclusions
We created five lightweight neural networks that can detect hand flapping from unstructured videos. Training a long short-term memory network with convolutional feature vectors outperformed training with feature vectors of hand coordinates and used almost 900,000 fewer model parameters. This study provides the first step toward developing precise deep learning methods for activity detection of autism-related behaviors.
Collapse
|
10
|
Deep Learning-Based Autism Spectrum Disorder Detection Using Emotion Features From Video Recordings (Preprint). JMIR BIOMEDICAL ENGINEERING 2022. [DOI: 10.2196/39982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
11
|
Classifying Autism From Crowdsourced Semistructured Speech Recordings: Machine Learning Model Comparison Study. JMIR Pediatr Parent 2022; 5:e35406. [PMID: 35436234 PMCID: PMC9052034 DOI: 10.2196/35406] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 01/18/2022] [Accepted: 01/25/2022] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Autism spectrum disorder (ASD) is a neurodevelopmental disorder that results in altered behavior, social development, and communication patterns. In recent years, autism prevalence has tripled, with 1 in 44 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process that requires the work of trained physicians, significant attention has been given to developing systems that automatically detect autism. We work toward this goal by analyzing audio data, as prosody abnormalities are a signal of autism, with affected children displaying speech idiosyncrasies such as echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns. OBJECTIVE We aimed to test the ability for machine learning approaches to aid in detection of autism in self-recorded speech audio captured from children with ASD and neurotypical (NT) children in their home environments. METHODS We considered three methods to detect autism in child speech: (1) random forests trained on extracted audio features (including Mel-frequency cepstral coefficients); (2) convolutional neural networks trained on spectrograms; and (3) fine-tuned wav2vec 2.0-a state-of-the-art transformer-based speech recognition model. We trained our classifiers on our novel data set of cellphone-recorded child speech audio curated from the Guess What? mobile game, an app designed to crowdsource videos of children with ASD and NT children in a natural home environment. RESULTS The random forest classifier achieved 70% accuracy, the fine-tuned wav2vec 2.0 model achieved 77% accuracy, and the convolutional neural network achieved 79% accuracy when classifying children's audio as either ASD or NT. We used 5-fold cross-validation to evaluate model performance. CONCLUSIONS Our models were able to predict autism status when trained on a varied selection of home audio clips with inconsistent recording qualities, which may be more representative of real-world conditions. The results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment.
Collapse
|
12
|
Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study. JMIR Pediatr Parent 2022; 5:e26760. [PMID: 35394438 PMCID: PMC9034430 DOI: 10.2196/26760] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 03/24/2021] [Accepted: 01/03/2022] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. OBJECTIVE We designed a strategy to gamify the collection and labeling of child emotion-enriched images to boost the performance of automatic child emotion recognition models to a level closer to what will be needed for digital health care approaches. METHODS We leveraged our prototype therapeutic smartphone game, GuessWhat, which was designed in large part for children with developmental and behavioral conditions, to gamify the secure collection of video data of children expressing a variety of emotions prompted by the game. Independently, we created a secure web interface to gamify the human labeling effort, called HollywoodSquares, tailored for use by any qualified labeler. We gathered and labeled 2155 videos, 39,968 emotion frames, and 106,001 labels on all images. With this drastically expanded pediatric emotion-centric database (>30 times larger than existing public pediatric emotion data sets), we trained a convolutional neural network (CNN) computer vision classifier of happy, sad, surprised, fearful, angry, disgust, and neutral expressions evoked by children. RESULTS The classifier achieved a 66.9% balanced accuracy and 67.4% F1-score on the entirety of the Child Affective Facial Expression (CAFE) as well as a 79.1% balanced accuracy and 78% F1-score on CAFE Subset A, a subset containing at least 60% human agreement on emotions labels. This performance is at least 10% higher than all previously developed classifiers evaluated against CAFE, the best of which reached a 56% balanced accuracy even when combining "anger" and "disgust" into a single class. CONCLUSIONS This work validates that mobile games designed for pediatric therapies can generate high volumes of domain-relevant data sets to train state-of-the-art classifiers to perform tasks helpful to precision health efforts.
Collapse
|
13
|
Crowd annotations can approximate clinical autism impressions from short home videos with privacy protections. INTELLIGENCE-BASED MEDICINE 2022; 6. [PMID: 35634270 PMCID: PMC9139408 DOI: 10.1016/j.ibmed.2022.100056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Artificial Intelligence (A.I.) solutions are increasingly considered for telemedicine. For these methods to serve children and their families in home settings, it is crucial to ensure the privacy of the child and parent or caregiver. To address this challenge, we explore the potential for global image transformations to provide privacy while preserving the quality of behavioral annotations. Crowd workers have previously been shown to reliably annotate behavioral features in unstructured home videos, allowing machine learning classifiers to detect autism using the annotations as input. We evaluate this method with videos altered via pixelation, dense optical flow, and Gaussian blurring. On a balanced test set of 30 videos of children with autism and 30 neurotypical controls, we find that the visual privacy alterations do not drastically alter any individual behavioral annotation at the item level. The AUROC on the evaluation set was 90.0% ±7.5% for unaltered videos, 85.0% ±9.0% for pixelation, 85.0% ±9.0% for optical flow, and 83.3% ±9.3% for blurring, demonstrating that an aggregation of small changes across behavioral questions can collectively result in increased misdiagnosis rates. We also compare crowd answers against clinicians who provided the same annotations for the same videos as crowd workers, and we find that clinicians have higher sensitivity in their recognition of autism-related symptoms. We also find that there is a linear correlation (r = 0.75, p < 0.0001) between the mean Clinical Global Impression (CGI) score provided by professional clinicians and the corresponding score emitted by a previously validated autism classifier with crowd inputs, indicating that the classifier’s output probability is a reliable estimate of the clinical impression of autism. A significant correlation is maintained with privacy alterations, indicating that crowd annotations can approximate clinician-provided autism impression from home videos in a privacy-preserved manner.
Collapse
|
14
|
A Local Community-Based Social Network for Mental Health and Well-being (Quokka): Exploratory Feasibility Study. JMIRX MED 2021; 2:e24972. [PMID: 37725541 PMCID: PMC10414255 DOI: 10.2196/24972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 03/30/2021] [Accepted: 07/25/2021] [Indexed: 09/21/2023]
Abstract
BACKGROUND Developing healthy habits and maintaining prolonged behavior changes are often difficult tasks. Mental health is one of the largest health concerns globally, including for college students. OBJECTIVE Our aim was to conduct an exploratory feasibility study of local community-based interventions by developing Quokka, a web platform promoting well-being activity on university campuses. We evaluated the intervention's potential for promotion of local, social, and unfamiliar activities pertaining to healthy habits. METHODS To evaluate this framework's potential for increased participation in healthy habits, we conducted a 6-to-8-week feasibility study via a "challenge" across 4 university campuses with a total of 277 participants. We chose a different well-being theme each week, and we conducted weekly surveys to (1) gauge factors that motivated users to complete or not complete the weekly challenge, (2) identify participation trends, and (3) evaluate the feasibility of the intervention to promote local, social, and novel well-being activities. We tested the hypotheses that Quokka participants would self-report participation in more local activities than remote activities for all challenges (Hypothesis H1), more social activities than individual activities (Hypothesis H2), and new rather than familiar activities (Hypothesis H3). RESULTS After Bonferroni correction using a Clopper-Pearson binomial proportion confidence interval for one test, we found that there was a strong preference for local activities for all challenge themes. Similarly, users significantly preferred group activities over individual activities (P<.001 for most challenge themes). For most challenge themes, there were not enough data to significantly distinguish a preference toward familiar or new activities (P<.001 for a subset of challenge themes in some schools). CONCLUSIONS We find that local community-based well-being interventions such as Quokka can facilitate positive behaviors. We discuss these findings and their implications for the research and design of location-based digital communities for well-being promotion.
Collapse
|
15
|
Crowdsourced privacy-preserved feature tagging of short home videos for machine learning ASD detection. Sci Rep 2021; 11:7620. [PMID: 33828118 PMCID: PMC8027393 DOI: 10.1038/s41598-021-87059-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 03/22/2021] [Indexed: 02/01/2023] Open
Abstract
Standard medical diagnosis of mental health conditions requires licensed experts who are increasingly outnumbered by those at risk, limiting reach. We test the hypothesis that a trustworthy crowd of non-experts can efficiently annotate behavioral features needed for accurate machine learning detection of the common childhood developmental disorder Autism Spectrum Disorder (ASD) for children under 8 years old. We implement a novel process for identifying and certifying a trustworthy distributed workforce for video feature extraction, selecting a workforce of 102 workers from a pool of 1,107. Two previously validated ASD logistic regression classifiers, evaluated against parent-reported diagnoses, were used to assess the accuracy of the trusted crowd's ratings of unstructured home videos. A representative balanced sample (N = 50 videos) of videos were evaluated with and without face box and pitch shift privacy alterations, with AUROC and AUPRC scores > 0.98. With both privacy-preserving modifications, sensitivity is preserved (96.0%) while maintaining specificity (80.0%) and accuracy (88.0%) at levels comparable to prior classification methods without alterations. We find that machine learning classification from features extracted by a certified nonexpert crowd achieves high performance for ASD detection from natural home videos of the child at risk and maintains high sensitivity when privacy-preserving mechanisms are applied. These results suggest that privacy-safeguarded crowdsourced analysis of short home videos can help enable rapid and mobile machine-learning detection of developmental delays in children.
Collapse
|
16
|
Selection of trustworthy crowd workers for telemedical diagnosis of pediatric autism spectrum disorder. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2021; 26:14-25. [PMID: 33691000 PMCID: PMC7958981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
Crowd-powered telemedicine has the potential to revolutionize healthcare, especially during times that require remote access to care. However, sharing private health data with strangers from around the world is not compatible with data privacy standards, requiring a stringent filtration process to recruit reliable and trustworthy workers who can go through the proper training and security steps. The key challenge, then, is to identify capable, trustworthy, and reliable workers through high-fidelity evaluation tasks without exposing any sensitive patient data during the evaluation process. We contribute a set of experimentally validated metrics for assessing the trustworthiness and reliability of crowd workers tasked with providing behavioral feature tags to unstructured videos of children with autism and matched neurotypical controls. The workers are blinded to diagnosis and blinded to the goal of using the features to diagnose autism. These behavioral labels are fed as input to a previously validated binary logistic regression classifier for detecting autism cases using categorical feature vectors. While the metrics do not incorporate any ground truth labels of child diagnosis, linear regression using the 3 correlative metrics as input can predict the mean probability of the correct class of each worker with a mean average error of 7.51% for performance on the same set of videos and 10.93% for performance on a distinct balanced video set with different children. These results indicate that crowd workers can be recruited for performance based largely on behavioral metrics on a crowdsourced task, enabling an affordable way to filter crowd workforces into a trustworthy and reliable diagnostic workforce.
Collapse
|
17
|
Feature replacement methods enable reliable home video analysis for machine learning detection of autism. Sci Rep 2020; 10:21245. [PMID: 33277527 PMCID: PMC7719177 DOI: 10.1038/s41598-020-76874-w] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 11/02/2020] [Indexed: 12/15/2022] Open
Abstract
Autism Spectrum Disorder is a neuropsychiatric condition affecting 53 million children worldwide and for which early diagnosis is critical to the outcome of behavior therapies. Machine learning applied to features manually extracted from readily accessible videos (e.g., from smartphones) has the potential to scale this diagnostic process. However, nearly unavoidable variability in video quality can lead to missing features that degrade algorithm performance. To manage this uncertainty, we evaluated the impact of missing values and feature imputation methods on two previously published autism detection classifiers, trained on standard-of-care instrument scoresheets and tested on ratings of 140 children videos from YouTube. We compare the baseline method of listwise deletion to classic univariate and multivariate techniques. We also introduce a feature replacement method that, based on a score, selects a feature from an expanded dataset to fill-in the missing value. The replacement feature selected can be identical for all records (general) or automatically adjusted to the record considered (dynamic). Our results show that general and dynamic feature replacement methods achieve a higher performance than classic univariate and multivariate methods, supporting the hypothesis that algorithmic management can maintain the fidelity of video-based diagnostics in the face of missing values and variable video quality.
Collapse
|
18
|
Precision Telemedicine through Crowdsourced Machine Learning: Testing Variability of Crowd Workers for Video-Based Autism Feature Recognition. J Pers Med 2020; 10:E86. [PMID: 32823538 PMCID: PMC7564950 DOI: 10.3390/jpm10030086] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/09/2020] [Accepted: 08/10/2020] [Indexed: 02/06/2023] Open
Abstract
Mobilized telemedicine is becoming a key, and even necessary, facet of both precision health and precision medicine. In this study, we evaluate the capability and potential of a crowd of virtual workers-defined as vetted members of popular crowdsourcing platforms-to aid in the task of diagnosing autism. We evaluate workers when crowdsourcing the task of providing categorical ordinal behavioral ratings to unstructured public YouTube videos of children with autism and neurotypical controls. To evaluate emerging patterns that are consistent across independent crowds, we target workers from distinct geographic loci on two crowdsourcing platforms: an international group of workers on Amazon Mechanical Turk (MTurk) (N = 15) and Microworkers from Bangladesh (N = 56), Kenya (N = 23), and the Philippines (N = 25). We feed worker responses as input to a validated diagnostic machine learning classifier trained on clinician-filled electronic health records. We find that regardless of crowd platform or targeted country, workers vary in the average confidence of the correct diagnosis predicted by the classifier. The best worker responses produce a mean probability of the correct class above 80% and over one standard deviation above 50%, accuracy and variability on par with experts according to prior studies. There is a weak correlation between mean time spent on task and mean performance (r = 0.358, p = 0.005). These results demonstrate that while the crowd can produce accurate diagnoses, there are intrinsic differences in crowdworker ability to rate behavioral features. We propose a novel strategy for recruitment of crowdsourced workers to ensure high quality diagnostic evaluations of autism, and potentially many other pediatric behavioral health conditions. Our approach represents a viable step in the direction of crowd-based approaches for more scalable and affordable precision medicine.
Collapse
|
19
|
Data-Driven Diagnostics and the Potential of Mobile Artificial Intelligence for Digital Therapeutic Phenotyping in Computational Psychiatry. BIOLOGICAL PSYCHIATRY. COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2020; 5:759-769. [PMID: 32085921 PMCID: PMC7292741 DOI: 10.1016/j.bpsc.2019.11.015] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 11/24/2019] [Accepted: 11/25/2019] [Indexed: 01/11/2023]
Abstract
Data science and digital technologies have the potential to transform diagnostic classification. Digital technologies enable the collection of big data, and advances in machine learning and artificial intelligence enable scalable, rapid, and automated classification of medical conditions. In this review, we summarize and categorize various data-driven methods for diagnostic classification. In particular, we focus on autism as an example of a challenging disorder due to its highly heterogeneous nature. We begin by describing the frontier of data science methods for the neuropsychiatry of autism. We discuss early signs of autism as defined by existing pen-and-paper-based diagnostic instruments and describe data-driven feature selection techniques for determining the behaviors that are most salient for distinguishing children with autism from neurologically typical children. We then describe data-driven detection techniques, particularly computer vision and eye tracking, that provide a means of quantifying behavioral differences between cases and controls. We also describe methods of preserving the privacy of collected videos and prior efforts of incorporating humans in the diagnostic loop. Finally, we summarize existing digital therapeutic interventions that allow for data capture and longitudinal outcome tracking as the diagnosis moves along a positive trajectory. Digital phenotyping of autism is paving the way for quantitative psychiatry more broadly and will set the stage for more scalable, accessible, and precise diagnostic techniques in the field.
Collapse
|
20
|
Development and Validation of a Comprehensive Well-Being Scale for People in the University Environment (Pitt Wellness Scale) Using a Crowdsourcing Approach: Cross-Sectional Study. J Med Internet Res 2020; 22:e15075. [PMID: 32347801 PMCID: PMC7221649 DOI: 10.2196/15075] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 02/23/2020] [Accepted: 03/19/2020] [Indexed: 01/23/2023] Open
Abstract
Background Well-being has multiple domains, and these domains are unique to the population being examined. Therefore, to precisely assess the well-being of a population, a scale specifically designed for that population is needed. Objective The goal of this study was to design and validate a comprehensive well-being scale for people in a university environment, including students, faculty, and staff. Methods A crowdsourcing approach was used to determine relevant domains for the comprehensive well-being scale in this population and identify specific questions to include in each domain. A web-based questionnaire (Q1) was used to collect opinions from a group of university students, faculty, and staff about the domains and subdomains of the scale. A draft of a new well-being scale (Q2) was created in response to the information collected via Q1, and a second group of study participants was invited to evaluate the relevance and clarity of each statement. A newly created well-being scale (Q3) was then used by a third group of university students, faculty, and staff. A psychometric analysis was performed on the data collected via Q3 to determine the validity and reliability of the well-being scale. Results In the first step, a group of 518 university community members (students, faculty, and staff) indicated the domains and subdomains that they desired to have in a comprehensive well-being scale. In the second step, a second group of 167 students, faculty, and staff evaluated the relevance and clarity of the proposed statements in each domain. In the third step, a third group of 546 students, faculty, and staff provided their responses to the new well-being scale (Pitt Wellness Scale). The psychometric analysis indicated that the reliability of the well-being scale was high. Conclusions Using a crowdsourcing approach, we successfully created a comprehensive and highly reliable well-being scale for people in the university environment. Our new Pitt Wellness Scale may be used to measure the well-being of people in the university environment.
Collapse
|
21
|
The Performance of Emotion Classifiers for Children With Parent-Reported Autism: Quantitative Feasibility Study. JMIR Ment Health 2020; 7:e13174. [PMID: 32234701 PMCID: PMC7160704 DOI: 10.2196/13174] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 07/03/2019] [Accepted: 02/23/2020] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Autism spectrum disorder (ASD) is a developmental disorder characterized by deficits in social communication and interaction, and restricted and repetitive behaviors and interests. The incidence of ASD has increased in recent years; it is now estimated that approximately 1 in 40 children in the United States are affected. Due in part to increasing prevalence, access to treatment has become constrained. Hope lies in mobile solutions that provide therapy through artificial intelligence (AI) approaches, including facial and emotion detection AI models developed by mainstream cloud providers, available directly to consumers. However, these solutions may not be sufficiently trained for use in pediatric populations. OBJECTIVE Emotion classifiers available off-the-shelf to the general public through Microsoft, Amazon, Google, and Sighthound are well-suited to the pediatric population, and could be used for developing mobile therapies targeting aspects of social communication and interaction, perhaps accelerating innovation in this space. This study aimed to test these classifiers directly with image data from children with parent-reported ASD recruited through crowdsourcing. METHODS We used a mobile game called Guess What? that challenges a child to act out a series of prompts displayed on the screen of the smartphone held on the forehead of his or her care provider. The game is intended to be a fun and engaging way for the child and parent to interact socially, for example, the parent attempting to guess what emotion the child is acting out (eg, surprised, scared, or disgusted). During a 90-second game session, as many as 50 prompts are shown while the child acts, and the video records the actions and expressions of the child. Due in part to the fun nature of the game, it is a viable way to remotely engage pediatric populations, including the autism population through crowdsourcing. We recruited 21 children with ASD to play the game and gathered 2602 emotive frames following their game sessions. These data were used to evaluate the accuracy and performance of four state-of-the-art facial emotion classifiers to develop an understanding of the feasibility of these platforms for pediatric research. RESULTS All classifiers performed poorly for every evaluated emotion except happy. None of the classifiers correctly labeled over 60.18% (1566/2602) of the evaluated frames. Moreover, none of the classifiers correctly identified more than 11% (6/51) of the angry frames and 14% (10/69) of the disgust frames. CONCLUSIONS The findings suggest that commercial emotion classifiers may be insufficiently trained for use in digital approaches to autism treatment and treatment tracking. Secure, privacy-preserving methods to increase labeled training data are needed to boost the models' performance before they can be used in AI-enabled approaches to social therapy of the kind that is common in autism treatments.
Collapse
|
22
|
Feature Selection and Dimension Reduction of Social Autism Data. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:707-718. [PMID: 31797640 PMCID: PMC6927820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
Autism Spectrum Disorder (ASD) is a complex neuropsychiatric condition with a highly heterogeneous phenotype. Following the work of Duda et al., which uses a reduced feature set from the Social Responsiveness Scale, Second Edition (SRS) to distinguish ASD from ADHD, we performed item-level question selection on answers to the SRS to determine whether ASD can be distinguished from non-ASD using a similarly small subset of questions. To explore feature redundancies between the SRS questions, we performed filter, wrapper, and embedded feature selection analyses. To explore the linearity of the SRS-related ASD phenotype, we then compressed the 65-question SRS into low-dimension representations using PCA, t-SNE, and a denoising autoencoder. We measured the performance of a multilayer perceptron (MLP) classifier with the top-ranking questions as input. Classification using only the top-rated question resulted in an AUC of over 92% for SRS-derived diagnoses and an AUC of over 83% for dataset-specific diagnoses. High redundancy of features have implications towards replacing the social behaviors that are targeted in behavioral diagnostics and interventions, where digital quantification of certain features may be obfuscated due to privacy concerns. We similarly evaluated the performance of an MLP classifier trained on the low-dimension representations of the SRS, finding that the denoising autoencoder achieved slightly higher performance than the PCA and t-SNE representations.
Collapse
|