1
|
Paskov K, Chrisman B, Stockham N, Washington PY, Dunlap K, Jung JY, Wall DP. Identifying crossovers and shared genetic material in whole genome sequencing data from families. Genome Res 2023; 33:1747-1756. [PMID: 37879861 PMCID: PMC10691535 DOI: 10.1101/gr.277172.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/12/2023] [Indexed: 10/27/2023]
Abstract
Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.
Collapse
Affiliation(s)
- Kelley Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA;
| | - Brianna Chrisman
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | - Nathaniel Stockham
- Department of Neuroscience, Stanford University, Stanford, California 94305, USA
| | | | - Kaitlyn Dunlap
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics, Stanford University, Stanford, California 94305, USA
| | - Jae-Yoon Jung
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics, Stanford University, Stanford, California 94305, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics, Stanford University, Stanford, California 94305, USA
| |
Collapse
|
2
|
Stockham N, Washington P, Chrisman B, Paskov K, Jung JY, Wall DP. Causal Modeling to Mitigate Selection Bias and Unmeasured Confounding in Internet-Based Epidemiology of COVID-19: Model Development and Validation. JMIR Public Health Surveill 2022; 8:e31306. [PMID: 35605128 PMCID: PMC9307267 DOI: 10.2196/31306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 02/22/2022] [Accepted: 05/17/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Selection bias and unmeasured confounding are fundamental problems in epidemiology that threaten study internal and external validity. These phenomena are particularly dangerous in internet-based public health surveillance, where traditional mitigation and adjustment methods are inapplicable, unavailable, or out of date. Recent theoretical advances in causal modeling can mitigate these threats, but these innovations have not been widely deployed in the epidemiological community. OBJECTIVE The purpose of our paper is to demonstrate the practical utility of causal modeling to both detect unmeasured confounding and selection bias and guide model selection to minimize bias. We implemented this approach in an applied epidemiological study of the COVID-19 cumulative infection rate in the New York City (NYC) spring 2020 epidemic. METHODS We collected primary data from Qualtrics surveys of Amazon Mechanical Turk (MTurk) crowd workers residing in New Jersey and New York State across 2 sampling periods: April 11-14 and May 8-11, 2020. The surveys queried the subjects on household health status and demographic characteristics. We constructed a set of possible causal models of household infection and survey selection mechanisms and ranked them by compatibility with the collected survey data. The most compatible causal model was then used to estimate the cumulative infection rate in each survey period. RESULTS There were 527 and 513 responses collected for the 2 periods, respectively. Response demographics were highly skewed toward a younger age in both survey periods. Despite the extremely strong relationship between age and COVID-19 symptoms, we recovered minimally biased estimates of the cumulative infection rate using only primary data and the most compatible causal model, with a relative bias of +3.8% and -1.9% from the reported cumulative infection rate for the first and second survey periods, respectively. CONCLUSIONS We successfully recovered accurate estimates of the cumulative infection rate from an internet-based crowdsourced sample despite considerable selection bias and unmeasured confounding in the primary data. This implementation demonstrates how simple applications of structural causal modeling can be effectively used to determine falsifiable model conditions, detect selection bias and confounding factors, and minimize estimate bias through model selection in a novel epidemiological context. As the disease and social dynamics of COVID-19 continue to evolve, public health surveillance protocols must continue to adapt; the emergence of Omicron variants and shift to at-home testing as recent challenges. Rigorous and transparent methods to develop, deploy, and diagnosis adapted surveillance protocols will be critical to their success.
Collapse
Affiliation(s)
- Nathaniel Stockham
- Neurosciences Interdepartmental Program, Stanford University, Palo Alto, CA, United States
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, CA, United States
| | - Brianna Chrisman
- Department of Bioengineering, Stanford University, Stanford, CA, United States
| | - Kelley Paskov
- Biomedical Informatics Program, Stanford University, Stanford, CA, United States
| | - Jae-Yoon Jung
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Dennis Paul Wall
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
- Department of Pediatrics, Stanford University, Stanford, CA, United States
| |
Collapse
|
3
|
Lakkapragada A, Kline A, Mutlu OC, Paskov K, Chrisman B, Stockham N, Washington P, Wall DP. The Classification of Abnormal Hand Movement to Aid in Autism Detection: Machine Learning Study. JMIR Biomed Eng 2022. [DOI: 10.2196/33771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Background
A formal autism diagnosis can be an inefficient and lengthy process. Families may wait several months or longer before receiving a diagnosis for their child despite evidence that earlier intervention leads to better treatment outcomes. Digital technologies that detect the presence of behaviors related to autism can scale access to pediatric diagnoses. A strong indicator of the presence of autism is self-stimulatory behaviors such as hand flapping.
Objective
This study aims to demonstrate the feasibility of deep learning technologies for the detection of hand flapping from unstructured home videos as a first step toward validation of whether statistical models coupled with digital technologies can be leveraged to aid in the automatic behavioral analysis of autism. To support the widespread sharing of such home videos, we explored privacy-preserving modifications to the input space via conversion of each video to hand landmark coordinates and measured the performance of corresponding time series classifiers.
Methods
We used the Self-Stimulatory Behavior Dataset (SSBD) that contains 75 videos of hand flapping, head banging, and spinning exhibited by children. From this data set, we extracted 100 hand flapping videos and 100 control videos, each between 2 to 5 seconds in duration. We evaluated five separate feature representations: four privacy-preserved subsets of hand landmarks detected by MediaPipe and one feature representation obtained from the output of the penultimate layer of a MobileNetV2 model fine-tuned on the SSBD. We fed these feature vectors into a long short-term memory network that predicted the presence of hand flapping in each video clip.
Results
The highest-performing model used MobileNetV2 to extract features and achieved a test F1 score of 84 (SD 3.7; precision 89.6, SD 4.3 and recall 80.4, SD 6) using 5-fold cross-validation for 100 random seeds on the SSBD data (500 total distinct folds). Of the models we trained on privacy-preserved data, the model trained with all hand landmarks reached an F1 score of 66.6 (SD 3.35). Another such model trained with a select 6 landmarks reached an F1 score of 68.3 (SD 3.6). A privacy-preserved model trained using a single landmark at the base of the hands and a model trained with the average of the locations of all the hand landmarks reached an F1 score of 64.9 (SD 6.5) and 64.2 (SD 6.8), respectively.
Conclusions
We created five lightweight neural networks that can detect hand flapping from unstructured videos. Training a long short-term memory network with convolutional feature vectors outperformed training with feature vectors of hand coordinates and used almost 900,000 fewer model parameters. This study provides the first step toward developing precise deep learning methods for activity detection of autism-related behaviors.
Collapse
|
4
|
Washington P, Tariq Q, Leblanc E, Chrisman B, Dunlap K, Kline A, Kalantarian H, Penev Y, Paskov K, Voss C, Stockham N, Varma M, Husic A, Kent J, Haber N, Winograd T, Wall DP. Crowdsourced privacy-preserved feature tagging of short home videos for machine learning ASD detection. Sci Rep 2021; 11:7620. [PMID: 33828118 PMCID: PMC8027393 DOI: 10.1038/s41598-021-87059-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 03/22/2021] [Indexed: 02/01/2023] Open
Abstract
Standard medical diagnosis of mental health conditions requires licensed experts who are increasingly outnumbered by those at risk, limiting reach. We test the hypothesis that a trustworthy crowd of non-experts can efficiently annotate behavioral features needed for accurate machine learning detection of the common childhood developmental disorder Autism Spectrum Disorder (ASD) for children under 8 years old. We implement a novel process for identifying and certifying a trustworthy distributed workforce for video feature extraction, selecting a workforce of 102 workers from a pool of 1,107. Two previously validated ASD logistic regression classifiers, evaluated against parent-reported diagnoses, were used to assess the accuracy of the trusted crowd's ratings of unstructured home videos. A representative balanced sample (N = 50 videos) of videos were evaluated with and without face box and pitch shift privacy alterations, with AUROC and AUPRC scores > 0.98. With both privacy-preserving modifications, sensitivity is preserved (96.0%) while maintaining specificity (80.0%) and accuracy (88.0%) at levels comparable to prior classification methods without alterations. We find that machine learning classification from features extracted by a certified nonexpert crowd achieves high performance for ASD detection from natural home videos of the child at risk and maintains high sensitivity when privacy-preserving mechanisms are applied. These results suggest that privacy-safeguarded crowdsourced analysis of short home videos can help enable rapid and mobile machine-learning detection of developmental delays in children.
Collapse
Affiliation(s)
- Peter Washington
- grid.168010.e0000000419368956Department of Bioengineering, Stanford University, Stanford, CA USA
| | | | - Emilie Leblanc
- grid.168010.e0000000419368956Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA USA
| | - Brianna Chrisman
- grid.168010.e0000000419368956Department of Bioengineering, Stanford University, Stanford, CA USA
| | - Kaitlyn Dunlap
- grid.168010.e0000000419368956Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA USA
| | - Aaron Kline
- grid.168010.e0000000419368956Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA USA
| | - Haik Kalantarian
- grid.168010.e0000000419368956Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA USA
| | - Yordan Penev
- grid.168010.e0000000419368956Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA USA
| | - Kelley Paskov
- grid.168010.e0000000419368956Department of Biomedical Data Science, Stanford University, Stanford, CA USA
| | - Catalin Voss
- grid.168010.e0000000419368956Department of Computer Science, Stanford University, Stanford, CA USA
| | - Nathaniel Stockham
- grid.168010.e0000000419368956Department of Neuroscience, Stanford University, Stanford, CA USA
| | - Maya Varma
- grid.168010.e0000000419368956Department of Computer Science, Stanford University, Stanford, CA USA
| | - Arman Husic
- grid.168010.e0000000419368956Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA USA
| | - Jack Kent
- grid.168010.e0000000419368956Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA USA
| | - Nick Haber
- grid.168010.e0000000419368956Graduate School of Education, Stanford University, Stanford, CA USA
| | - Terry Winograd
- grid.168010.e0000000419368956Department of Computer Science, Stanford University, Stanford, CA USA
| | - Dennis P. Wall
- grid.168010.e0000000419368956Department of Pediatrics (Systems Medicine), Stanford University, Stanford, CA USA ,grid.168010.e0000000419368956Department of Biomedical Data Science, Stanford University, Stanford, CA USA ,grid.168010.e0000000419368956Department of Psychiatry and Behavioral Sciences (By Courtesy), Stanford University, Stanford, CA USA
| |
Collapse
|
5
|
Washington P, Leblanc E, Dunlap K, Penev Y, Varma M, Jung JY, Chrisman B, Sun MW, Stockham N, Paskov KM, Kalantarian H, Voss C, Haber N, Wall DP. Selection of trustworthy crowd workers for telemedical diagnosis of pediatric autism spectrum disorder. Pac Symp Biocomput 2021; 26:14-25. [PMID: 33691000 PMCID: PMC7958981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
Crowd-powered telemedicine has the potential to revolutionize healthcare, especially during times that require remote access to care. However, sharing private health data with strangers from around the world is not compatible with data privacy standards, requiring a stringent filtration process to recruit reliable and trustworthy workers who can go through the proper training and security steps. The key challenge, then, is to identify capable, trustworthy, and reliable workers through high-fidelity evaluation tasks without exposing any sensitive patient data during the evaluation process. We contribute a set of experimentally validated metrics for assessing the trustworthiness and reliability of crowd workers tasked with providing behavioral feature tags to unstructured videos of children with autism and matched neurotypical controls. The workers are blinded to diagnosis and blinded to the goal of using the features to diagnose autism. These behavioral labels are fed as input to a previously validated binary logistic regression classifier for detecting autism cases using categorical feature vectors. While the metrics do not incorporate any ground truth labels of child diagnosis, linear regression using the 3 correlative metrics as input can predict the mean probability of the correct class of each worker with a mean average error of 7.51% for performance on the same set of videos and 10.93% for performance on a distinct balanced video set with different children. These results indicate that crowd workers can be recruited for performance based largely on behavioral metrics on a crowdsourced task, enabling an affordable way to filter crowd workforces into a trustworthy and reliable diagnostic workforce.
Collapse
Affiliation(s)
- Peter Washington
- Department of Bioengineering, Stanford University, Palo Alto, CA, 94305, USA
| | - Emilie Leblanc
- Department of Pediatrics (Systems Medicine), Stanford University, Palo Alto, CA, 94305, USA,Department of Biomedical Data Science, Stanford University, Palo Alto, CA, 94305, USA
| | - Kaitlyn Dunlap
- Department of Pediatrics (Systems Medicine), Stanford University, Palo Alto, CA, 94305, USA,Department of Biomedical Data Science, Stanford University, Palo Alto, CA, 94305, USA
| | - Yordan Penev
- Department of Pediatrics (Systems Medicine), Stanford University, Palo Alto, CA, 94305, USA,Department of Biomedical Data Science, Stanford University, Palo Alto, CA, 94305, USA
| | - Maya Varma
- Department of Computer Science, Stanford University, Palo Alto, CA, 94305, USA
| | - Jae-Yoon Jung
- Department of Pediatrics (Systems Medicine), Stanford University, Palo Alto, CA, 94305, USA,Department of Biomedical Data Science, Stanford University, Palo Alto, CA, 94305, USA
| | - Brianna Chrisman
- Department of Bioengineering, Stanford University, Palo Alto, CA, 94305, USA
| | - Min Woo Sun
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, 94305, USA
| | - Nathaniel Stockham
- Department of Neuroscience, Stanford University, Palo Alto, CA, 94305, USA
| | - Kelley Marie Paskov
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, 94305, USA
| | - Haik Kalantarian
- Department of Pediatrics (Systems Medicine), Stanford University, Palo Alto, CA, 94305, USA,Department of Biomedical Data Science, Stanford University, Palo Alto, CA, 94305, USA
| | - Catalin Voss
- Department of Computer Science, Stanford University, Palo Alto, CA, 94305, USA
| | - Nick Haber
- School of Education, Stanford University, Palo Alto, CA, 94305, USA
| | - Dennis P. Wall
- Department of Pediatrics (Systems Medicine), Stanford University, Palo Alto, CA, 94305, USA,Department of Biomedical Data Science, Stanford University, Palo Alto, CA, 94305, USA
| |
Collapse
|
6
|
Washington P, Leblanc E, Dunlap K, Penev Y, Kline A, Paskov K, Sun MW, Chrisman B, Stockham N, Varma M, Voss C, Haber N, Wall DP. Precision Telemedicine through Crowdsourced Machine Learning: Testing Variability of Crowd Workers for Video-Based Autism Feature Recognition. J Pers Med 2020; 10:E86. [PMID: 32823538 PMCID: PMC7564950 DOI: 10.3390/jpm10030086] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 08/09/2020] [Accepted: 08/10/2020] [Indexed: 02/06/2023] Open
Abstract
Mobilized telemedicine is becoming a key, and even necessary, facet of both precision health and precision medicine. In this study, we evaluate the capability and potential of a crowd of virtual workers-defined as vetted members of popular crowdsourcing platforms-to aid in the task of diagnosing autism. We evaluate workers when crowdsourcing the task of providing categorical ordinal behavioral ratings to unstructured public YouTube videos of children with autism and neurotypical controls. To evaluate emerging patterns that are consistent across independent crowds, we target workers from distinct geographic loci on two crowdsourcing platforms: an international group of workers on Amazon Mechanical Turk (MTurk) (N = 15) and Microworkers from Bangladesh (N = 56), Kenya (N = 23), and the Philippines (N = 25). We feed worker responses as input to a validated diagnostic machine learning classifier trained on clinician-filled electronic health records. We find that regardless of crowd platform or targeted country, workers vary in the average confidence of the correct diagnosis predicted by the classifier. The best worker responses produce a mean probability of the correct class above 80% and over one standard deviation above 50%, accuracy and variability on par with experts according to prior studies. There is a weak correlation between mean time spent on task and mean performance (r = 0.358, p = 0.005). These results demonstrate that while the crowd can produce accurate diagnoses, there are intrinsic differences in crowdworker ability to rate behavioral features. We propose a novel strategy for recruitment of crowdsourced workers to ensure high quality diagnostic evaluations of autism, and potentially many other pediatric behavioral health conditions. Our approach represents a viable step in the direction of crowd-based approaches for more scalable and affordable precision medicine.
Collapse
Affiliation(s)
- Peter Washington
- Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, CA 94305, USA; (P.W.); (B.C.)
| | - Emilie Leblanc
- Department of Pediatrics (Systems Medicine), Stanford University, 1265 Welch Rd., Stanford, CA 94305, USA; (E.L.); (K.D.); (Y.P.); (A.K.)
| | - Kaitlyn Dunlap
- Department of Pediatrics (Systems Medicine), Stanford University, 1265 Welch Rd., Stanford, CA 94305, USA; (E.L.); (K.D.); (Y.P.); (A.K.)
| | - Yordan Penev
- Department of Pediatrics (Systems Medicine), Stanford University, 1265 Welch Rd., Stanford, CA 94305, USA; (E.L.); (K.D.); (Y.P.); (A.K.)
| | - Aaron Kline
- Department of Pediatrics (Systems Medicine), Stanford University, 1265 Welch Rd., Stanford, CA 94305, USA; (E.L.); (K.D.); (Y.P.); (A.K.)
| | - Kelley Paskov
- Department of Biomedical Data Science, Stanford University, 1265 Welch Rd., Stanford, CA 94305, USA; (K.P.); (M.W.S.)
| | - Min Woo Sun
- Department of Biomedical Data Science, Stanford University, 1265 Welch Rd., Stanford, CA 94305, USA; (K.P.); (M.W.S.)
| | - Brianna Chrisman
- Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, CA 94305, USA; (P.W.); (B.C.)
| | - Nathaniel Stockham
- Department of Neuroscience, Stanford University, 213 Quarry Rd., Stanford, CA 94305, USA;
| | - Maya Varma
- Department of Computer Science, Stanford University, 353 Jane Stanford Way, Stanford, CA 94305, USA; (M.V.); (C.V.)
| | - Catalin Voss
- Department of Computer Science, Stanford University, 353 Jane Stanford Way, Stanford, CA 94305, USA; (M.V.); (C.V.)
| | - Nick Haber
- School of Education, Stanford University, 485 Lasuen Mall, Stanford, CA 94305, USA;
| | - Dennis P. Wall
- Department of Pediatrics (Systems Medicine), Stanford University, 1265 Welch Rd., Stanford, CA 94305, USA; (E.L.); (K.D.); (Y.P.); (A.K.)
- Department of Biomedical Data Science, Stanford University, 1265 Welch Rd., Stanford, CA 94305, USA; (K.P.); (M.W.S.)
| |
Collapse
|
7
|
Washington P, Park N, Srivastava P, Voss C, Kline A, Varma M, Tariq Q, Kalantarian H, Schwartz J, Patnaik R, Chrisman B, Stockham N, Paskov K, Haber N, Wall DP. Data-Driven Diagnostics and the Potential of Mobile Artificial Intelligence for Digital Therapeutic Phenotyping in Computational Psychiatry. Biol Psychiatry Cogn Neurosci Neuroimaging 2020; 5:759-769. [PMID: 32085921 PMCID: PMC7292741 DOI: 10.1016/j.bpsc.2019.11.015] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 11/24/2019] [Accepted: 11/25/2019] [Indexed: 01/11/2023]
Abstract
Data science and digital technologies have the potential to transform diagnostic classification. Digital technologies enable the collection of big data, and advances in machine learning and artificial intelligence enable scalable, rapid, and automated classification of medical conditions. In this review, we summarize and categorize various data-driven methods for diagnostic classification. In particular, we focus on autism as an example of a challenging disorder due to its highly heterogeneous nature. We begin by describing the frontier of data science methods for the neuropsychiatry of autism. We discuss early signs of autism as defined by existing pen-and-paper-based diagnostic instruments and describe data-driven feature selection techniques for determining the behaviors that are most salient for distinguishing children with autism from neurologically typical children. We then describe data-driven detection techniques, particularly computer vision and eye tracking, that provide a means of quantifying behavioral differences between cases and controls. We also describe methods of preserving the privacy of collected videos and prior efforts of incorporating humans in the diagnostic loop. Finally, we summarize existing digital therapeutic interventions that allow for data capture and longitudinal outcome tracking as the diagnosis moves along a positive trajectory. Digital phenotyping of autism is paving the way for quantitative psychiatry more broadly and will set the stage for more scalable, accessible, and precise diagnostic techniques in the field.
Collapse
Affiliation(s)
- Peter Washington
- Department of Bioengineering, Stanford University, Stanford, California
| | - Natalie Park
- Department of Biological Sciences, Columbia University, New York, New York
| | - Parishkrita Srivastava
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California
| | - Catalin Voss
- Department of Computer Science, Stanford University, Stanford, California
| | - Aaron Kline
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, California; Department of Biomedical Data Science, Stanford University, Stanford, California
| | - Maya Varma
- Department of Computer Science, Stanford University, Stanford, California
| | - Qandeel Tariq
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, California; Department of Biomedical Data Science, Stanford University, Stanford, California
| | - Haik Kalantarian
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, California; Department of Biomedical Data Science, Stanford University, Stanford, California
| | - Jessey Schwartz
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, California; Department of Biomedical Data Science, Stanford University, Stanford, California
| | - Ritik Patnaik
- Department of Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Brianna Chrisman
- Department of Bioengineering, Stanford University, Stanford, California
| | | | - Kelley Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, California
| | - Nick Haber
- School of Education, Stanford University, Stanford, California
| | - Dennis P Wall
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, California; Department of Biomedical Data Science, Stanford University, Stanford, California; Department of Psychiatry and Behavioral Sciences (by courtesy), Stanford University, Stanford, California.
| |
Collapse
|
8
|
Washington P, Paskov KM, Kalantarian H, Stockham N, Voss C, Kline A, Patnaik R, Chrisman B, Varma M, Tariq Q, Dunlap K, Schwartz J, Haber N, Wall DP. Feature Selection and Dimension Reduction of Social Autism Data. Pac Symp Biocomput 2020; 25:707-718. [PMID: 31797640 PMCID: PMC6927820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
Autism Spectrum Disorder (ASD) is a complex neuropsychiatric condition with a highly heterogeneous phenotype. Following the work of Duda et al., which uses a reduced feature set from the Social Responsiveness Scale, Second Edition (SRS) to distinguish ASD from ADHD, we performed item-level question selection on answers to the SRS to determine whether ASD can be distinguished from non-ASD using a similarly small subset of questions. To explore feature redundancies between the SRS questions, we performed filter, wrapper, and embedded feature selection analyses. To explore the linearity of the SRS-related ASD phenotype, we then compressed the 65-question SRS into low-dimension representations using PCA, t-SNE, and a denoising autoencoder. We measured the performance of a multilayer perceptron (MLP) classifier with the top-ranking questions as input. Classification using only the top-rated question resulted in an AUC of over 92% for SRS-derived diagnoses and an AUC of over 83% for dataset-specific diagnoses. High redundancy of features have implications towards replacing the social behaviors that are targeted in behavioral diagnostics and interventions, where digital quantification of certain features may be obfuscated due to privacy concerns. We similarly evaluated the performance of an MLP classifier trained on the low-dimension representations of the SRS, finding that the denoising autoencoder achieved slightly higher performance than the PCA and t-SNE representations.
Collapse
Affiliation(s)
- Peter Washington
- Department of Bioengineering, Stanford University, Palo Alto, CA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Washington P, Kalantarian H, Tariq Q, Schwartz J, Dunlap K, Chrisman B, Varma M, Ning M, Kline A, Stockham N, Paskov K, Voss C, Haber N, Wall DP. Addendum to the Acknowledgements: Validity of Online Screening for Autism: Crowdsourcing Study Comparing Paid and Unpaid Diagnostic Tasks. J Med Internet Res 2019; 21:e14950. [PMID: 31250828 PMCID: PMC6620884 DOI: 10.2196/14950] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 06/13/2019] [Indexed: 11/19/2022] Open
Affiliation(s)
- Peter Washington
- Department of BioengineeringStanford UniversityStanford, CAUnited States
| | - Haik Kalantarian
- Department of Biomedical Data ScienceStanford UniversityStanford, CAUnited States
| | - Qandeel Tariq
- Department of Biomedical Data ScienceStanford UniversityStanford, CAUnited States
| | - Jessey Schwartz
- Department of Biomedical Data ScienceStanford UniversityStanford, CAUnited States
| | - Kaitlyn Dunlap
- Department of Biomedical Data ScienceStanford UniversityStanford, CAUnited States
| | - Brianna Chrisman
- Department of BioengineeringStanford UniversityStanford, CAUnited States
| | - Maya Varma
- Department of Computer ScienceStanford UniversityStanford, CAUnited States
| | - Michael Ning
- Department of Biomedical Data ScienceStanford UniversityStanford, CAUnited States
| | - Aaron Kline
- Department of Biomedical Data ScienceStanford UniversityStanford, CAUnited States
| | | | - Kelley Paskov
- Department of Biomedical Data ScienceStanford UniversityStanford, CAUnited States
| | - Catalin Voss
- Department of Computer ScienceStanford UniversityStanford, CAUnited States
| | - Nick Haber
- Department of Biomedical Data ScienceStanford UniversityStanford, CAUnited States
- Department of PediatricsStanford UniversityStanford, CAUnited States
- Department of PsychologyStanford UniversityStanford, CAUnited States
- Department of Psychiatry and Behavioral SciencesStanford UniversityStanford, CAUnited States
| | - Dennis Paul Wall
- Department of PediatricsStanford UniversityStanford, CAUnited States
- Department of Psychiatry and Behavioral SciencesStanford UniversityStanford, CAUnited States
- Division of Systems MedicineDepartment of Biomedical Data ScienceStanford UniversityPalo Alto, CAUnited States
| |
Collapse
|
10
|
Washington P, Kalantarian H, Tariq Q, Schwartz J, Dunlap K, Chrisman B, Varma M, Ning M, Kline A, Stockham N, Paskov K, Voss C, Haber N, Wall DP. Validity of Online Screening for Autism: Crowdsourcing Study Comparing Paid and Unpaid Diagnostic Tasks. J Med Internet Res 2019; 21:e13668. [PMID: 31124463 PMCID: PMC6552453 DOI: 10.2196/13668] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 04/15/2019] [Accepted: 04/16/2019] [Indexed: 12/05/2022] Open
Abstract
Background Obtaining a diagnosis of neuropsychiatric disorders such as autism requires long waiting times that can exceed a year and can be prohibitively expensive. Crowdsourcing approaches may provide a scalable alternative that can accelerate general access to care and permit underserved populations to obtain an accurate diagnosis. Objective We aimed to perform a series of studies to explore whether paid crowd workers on Amazon Mechanical Turk (AMT) and citizen crowd workers on a public website shared on social media can provide accurate online detection of autism, conducted via crowdsourced ratings of short home video clips. Methods Three online studies were performed: (1) a paid crowdsourcing task on AMT (N=54) where crowd workers were asked to classify 10 short video clips of children as “Autism” or “Not autism,” (2) a more complex paid crowdsourcing task (N=27) with only those raters who correctly rated ≥8 of the 10 videos during the first study, and (3) a public unpaid study (N=115) identical to the first study. Results For Study 1, the mean score of the participants who completed all questions was 7.50/10 (SD 1.46). When only analyzing the workers who scored ≥8/10 (n=27/54), there was a weak negative correlation between the time spent rating the videos and the sensitivity (ρ=–0.44, P=.02). For Study 2, the mean score of the participants rating new videos was 6.76/10 (SD 0.59). The average deviation between the crowdsourced answers and gold standard ratings provided by two expert clinical research coordinators was 0.56, with an SD of 0.51 (maximum possible SD is 3). All paid crowd workers who scored 8/10 in Study 1 either expressed enjoyment in performing the task in Study 2 or provided no negative comments. For Study 3, the mean score of the participants who completed all questions was 6.67/10 (SD 1.61). There were weak correlations between age and score (r=0.22, P=.014), age and sensitivity (r=–0.19, P=.04), number of family members with autism and sensitivity (r=–0.195, P=.04), and number of family members with autism and precision (r=–0.203, P=.03). A two-tailed t test between the scores of the paid workers in Study 1 and the unpaid workers in Study 3 showed a significant difference (P<.001). Conclusions Many paid crowd workers on AMT enjoyed answering screening questions from videos, suggesting higher intrinsic motivation to make quality assessments. Paid crowdsourcing provides promising screening assessments of pediatric autism with an average deviation <20% from professional gold standard raters, which is potentially a clinically informative estimate for parents. Parents of children with autism likely overfit their intuition to their own affected child. This work provides preliminary demographic data on raters who may have higher ability to recognize and measure features of autism across its wide range of phenotypic manifestations.
Collapse
Affiliation(s)
- Peter Washington
- Department of Bioengineering, Stanford University, Stanford, CA, United States
| | - Haik Kalantarian
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Qandeel Tariq
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Jessey Schwartz
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Kaitlyn Dunlap
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Brianna Chrisman
- Department of Bioengineering, Stanford University, Stanford, CA, United States
| | - Maya Varma
- Department of Computer Science, Stanford University, Stanford, CA, United States
| | - Michael Ning
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Aaron Kline
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Nathaniel Stockham
- Department of Neuroscience, Stanford University, Stanford, CA, United States
| | - Kelley Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Catalin Voss
- Department of Computer Science, Stanford University, Stanford, CA, United States
| | - Nick Haber
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States.,Department of Pediatrics, Stanford University, Stanford, CA, United States.,Department of Psychology, Stanford University, Stanford, CA, United States.,Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, United States
| | - Dennis Paul Wall
- Department of Pediatrics, Stanford University, Stanford, CA, United States.,Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, United States.,Division of Systems Medicine, Department of Biomedical Data Science, Stanford University, Palo Alto, CA, United States
| |
Collapse
|