1
|
Bobrovnikov M, Chai JT, Dinov ID. Interactive Visualization and Computation of 2D and 3D Probability Distributions. SN COMPUTER SCIENCE 2022; 3:327. [PMID: 37483660 PMCID: PMC10361712 DOI: 10.1007/s42979-022-01206-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 05/13/2022] [Indexed: 07/25/2023]
Abstract
Purpose Mathematical modeling, probability estimation, and statistical inference represent core elements of modern artificial intelligence (AI) approaches for data-driven prediction, forecasting, classification, risk-estimation, and prognosis. Currently there are many tools that help calculate and visualize univariate probability distributions, however, very few resources venture beyond into multivariate distributions, which are commonly used in advanced statistical inference and AI decision-making. This article presents a new web-calculator that enables some calculation and visualization of bivariate and trivariate probability distributions. Methods Several methods are explored to compute the joint bivariate and trivariate probability densities, including the optimal multivariate modeling using Gaussian copula. We developed an interactive webapp to visually illustrate the parallels between the mathematical formulation, computational implementation, and graphical depiction of multivariate probability density and cumulative distribution functions. To ensure the interface and functionality are hardware platform independent, scalable, and functional, the app and its component widgets are implemented using HTML5 and JavaScript. Results We validated the webapp by testing the multivariate copula models under different experimental conditions and inspecting the performance in terms of accuracy and reliability of the estimated multivariate probability densities and distribution function values. Conclusion This article demonstrates the construction, implementation, and utilization of multivariate probability calculators. The proposed webapp implementation is freely available online (https://socr.umich.edu/HTML5/BivariateNormal/BVN2/) and can be used to assist with education and research of a diverse array of data scientists, STEM instructors, and AI learners.
Collapse
Affiliation(s)
- Mark Bobrovnikov
- Statistics Online Computational Resource (SOCR) University of Michigan, Ann Arbor, MI 48109, USA https://socr.umich.edu
| | - Jared Tianyi Chai
- Statistics Online Computational Resource (SOCR) University of Michigan, Ann Arbor, MI 48109, USA https://socr.umich.edu
| | - Ivo D. Dinov
- Statistics Online Computational Resource (SOCR) University of Michigan, Ann Arbor, MI 48109, USA https://socr.umich.edu
| |
Collapse
|
2
|
Gao C, Sun H, Wang T, Tang M, Bohnen NI, Müller MLTM, Herman T, Giladi N, Kalinin A, Spino C, Dauer W, Hausdorff JM, Dinov ID. Model-based and Model-free Machine Learning Techniques for Diagnostic Prediction and Classification of Clinical Outcomes in Parkinson's Disease. Sci Rep 2018; 8:7129. [PMID: 29740058 PMCID: PMC5940671 DOI: 10.1038/s41598-018-24783-4] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 04/10/2018] [Indexed: 01/08/2023] Open
Abstract
In this study, we apply a multidisciplinary approach to investigate falls in PD patients using clinical, demographic and neuroimaging data from two independent initiatives (University of Michigan and Tel Aviv Sourasky Medical Center). Using machine learning techniques, we construct predictive models to discriminate fallers and non-fallers. Through controlled feature selection, we identified the most salient predictors of patient falls including gait speed, Hoehn and Yahr stage, postural instability and gait difficulty-related measurements. The model-based and model-free analytical methods we employed included logistic regression, random forests, support vector machines, and XGboost. The reliability of the forecasts was assessed by internal statistical (5-fold) cross validation as well as by external out-of-bag validation. Four specific challenges were addressed in the study: Challenge 1, develop a protocol for harmonizing and aggregating complex, multisource, and multi-site Parkinson's disease data; Challenge 2, identify salient predictive features associated with specific clinical traits, e.g., patient falls; Challenge 3, forecast patient falls and evaluate the classification performance; and Challenge 4, predict tremor dominance (TD) vs. posture instability and gait difficulty (PIGD). Our findings suggest that, compared to other approaches, model-free machine learning based techniques provide a more reliable clinical outcome forecasting of falls in Parkinson's patients, for example, with a classification accuracy of about 70-80%.
Collapse
Affiliation(s)
- Chao Gao
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, MI, United States
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
| | - Hanbo Sun
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, MI, United States
- Department of Statistics, University of Michigan, Ann Arbor, MI, United States
| | - Tuo Wang
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, MI, United States
- Department of Statistics, University of Michigan, Ann Arbor, MI, United States
| | - Ming Tang
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, MI, United States
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
| | - Nicolaas I Bohnen
- Department of Radiology, University of Michigan, Ann Arbor, MI, United States
- Department of Neurology and Ann Arbor VA Medical Center, University of Michigan, Ann Arbor, MI, United States
- Morris K. Udall Center of Excellence for Parkinson's Disease Research, University of Michigan, Ann Arbor, MI, United States
| | - Martijn L T M Müller
- Department of Radiology, University of Michigan, Ann Arbor, MI, United States
- Department of Neurology and Ann Arbor VA Medical Center, University of Michigan, Ann Arbor, MI, United States
- Morris K. Udall Center of Excellence for Parkinson's Disease Research, University of Michigan, Ann Arbor, MI, United States
| | - Talia Herman
- The Center for the Study of Movement, Cognition and Mobility, Neurological Institute, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Nir Giladi
- The Center for the Study of Movement, Cognition and Mobility, Neurological Institute, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
- Department of Neurology and Sieratzki Chair in Neurology, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Alexandr Kalinin
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, MI, United States
- Morris K. Udall Center of Excellence for Parkinson's Disease Research, University of Michigan, Ann Arbor, MI, United States
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Cathie Spino
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
- Morris K. Udall Center of Excellence for Parkinson's Disease Research, University of Michigan, Ann Arbor, MI, United States
| | - William Dauer
- Department of Neurology and Ann Arbor VA Medical Center, University of Michigan, Ann Arbor, MI, United States
- Morris K. Udall Center of Excellence for Parkinson's Disease Research, University of Michigan, Ann Arbor, MI, United States
| | - Jeffrey M Hausdorff
- The Center for the Study of Movement, Cognition and Mobility, Neurological Institute, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
- Sagol School of Neuroscience and Department of Physical Therapy, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Rush Alzheimer's Disease Center & Orthopaedic Surgery, Rush University, Chicago, IL, USA
| | - Ivo D Dinov
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, MI, United States.
- Morris K. Udall Center of Excellence for Parkinson's Disease Research, University of Michigan, Ann Arbor, MI, United States.
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
3
|
Dinov ID, Kamino S, Bhakhrani B, Christou N. Technology-enhanced Interactive Teaching of Marginal, Joint and Conditional Probabilities: The Special Case of Bivariate Normal Distribution. TEACHING STATISTICS 2013; 35:131-139. [PMID: 25419016 PMCID: PMC4238889 DOI: 10.1111/test.12012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Data analysis requires subtle probability reasoning to answer questions like What is the chance of event A occurring, given that event B was observed? This generic question arises in discussions of many intriguing scientific questions such as What is the probability that an adolescent weighs between 120 and 140 pounds given that they are of average height? and What is the probability of (monetary) inflation exceeding 4% and housing price index below 110? To address such problems, learning some applied, theoretical or cross-disciplinary probability concepts is necessary. Teaching such courses can be improved by utilizing modern information technology resources. Students' understanding of multivariate distributions, conditional probabilities, correlation and causation can be significantly strengthened by employing interactive web-based science educational resources. Independent of the type of a probability course (e.g. majors, minors or service probability course, rigorous measure-theoretic, applied or statistics course) student motivation, learning experiences and knowledge retention may be enhanced by blending modern technological tools within the classical conceptual pedagogical models. We have designed, implemented and disseminated a portable open-source web-application for teaching multivariate distributions, marginal, joint and conditional probabilities using the special case of bivariate Normal distribution. A real adolescent height and weight dataset is used to demonstrate the classroom utilization of the new web-application to address problems of parameter estimation, univariate and multivariate inference.
Collapse
Affiliation(s)
- Ivo D Dinov
- Statistics Online Computational Resource University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Scott Kamino
- Statistics Online Computational Resource University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Bilal Bhakhrani
- Statistics Online Computational Resource University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nicolas Christou
- Statistics Online Computational Resource University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
4
|
Dinov ID, Christou N. Web-based tools for modelling and analysis of multivariate data: California ozone pollution activity. INTERNATIONAL JOURNAL OF MATHEMATICAL EDUCATION IN SCIENCE AND TECHNOLOGY 2011; 42:789-829. [PMID: 24465054 PMCID: PMC3901438 DOI: 10.1080/0020739x.2011.562315] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
This article presents a hands-on web-based activity motivated by the relation between human health and ozone pollution in California. This case study is based on multivariate data collected monthly at 20 locations in California between 1980 and 2006. Several strategies and tools for data interrogation and exploratory data analysis, model fitting and statistical inference on these data are presented. All components of this case study (data, tools, activity) are freely available online at: http://wiki.stat.ucla.edu/socr/index.php/SOCR_MotionCharts_CAOzoneData. Several types of exploratory (motion charts, box-and-whisker plots, spider charts) and quantitative (inference, regression, analysis of variance (ANOVA)) data analyses tools are demonstrated. Two specific human health related questions (temporal and geographic effects of ozone pollution) are discussed as motivational challenges.
Collapse
Affiliation(s)
- Ivo D. Dinov
- Statistics Online Computational Resource, University of California, 8125 Mathematical Science Building, Los Angeles, CA 90095, USA
- Center for Computational Biology, University of California, 8125 Mathematical Science Building, Los Angeles, CA 90095, USA
| | - Nicolas Christou
- Statistics Online Computational Resource, University of California, 8125 Mathematical Science Building, Los Angeles, CA 90095, USA
| |
Collapse
|
5
|
Christou N, Dinov ID. A Study of Students' Learning Styles, Discipline Attitudes and Knowledge Acquisition in Technology-Enhanced Probability and Statistics Education. JOURNAL OF ONLINE LEARNING AND TEACHING 2010; 6:http://jolt.merlot.org/vol6no3/dinov_0910.htm. [PMID: 21603097 PMCID: PMC3098746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Many modern technological advances have direct impact on the format, style and efficacy of delivery and consumption of educational content. For example, various novel communication and information technology tools and resources enable efficient, timely, interactive and graphical demonstrations of diverse scientific concepts. In this manuscript, we report on a meta-study of 3 controlled experiments of using the Statistics Online Computational Resources in probability and statistics courses. Web-accessible SOCR applets, demonstrations, simulations and virtual experiments were used in different courses as treatment and compared to matched control classes utilizing traditional pedagogical approaches. Qualitative and quantitative data we collected for all courses included Felder-Silverman-Soloman index of learning styles, background assessment, pre and post surveys of attitude towards the subject, end-point satisfaction survey, and varieties of quiz, laboratory and test scores. Our findings indicate that students' learning styles and attitudes towards a discipline may be important confounds of their final quantitative performance. The observed positive effects of integrating information technology with established pedagogical techniques may be valid across disciplines within the broader spectrum courses in the science education curriculum. The two critical components of improving science education via blended instruction include instructor training, and development of appropriate activities, simulations and interactive resources.
Collapse
Affiliation(s)
- Nicolas Christou
- Statistics Online Computational Resource University of California, Los Angeles Los Angeles, CA 90095 USA
| | | |
Collapse
|
6
|
Al-Aziz J, Christou N, Dinov ID. SOCR Motion Charts: An Efficient, Open-Source, Interactive and Dynamic Applet for Visualizing Longitudinal Multivariate Data. JOURNAL OF STATISTICS EDUCATION : AN INTERNATIONAL JOURNAL ON THE TEACHING AND LEARNING OF STATISTICS 2010; 18:v18n3/dinov. [PMID: 21479108 PMCID: PMC3071754 DOI: 10.1080/10691898.2010.11889581] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The amount, complexity and provenance of data have dramatically increased in the past five years. Visualization of observed and simulated data is a critical component of any social, environmental, biomedical or scientific quest. Dynamic, exploratory and interactive visualization of multivariate data, without preprocessing by dimensionality reduction, remains a nearly insurmountable challenge. The Statistics Online Computational Resource (www.SOCR.ucla.edu) provides portable online aids for probability and statistics education, technology-based instruction and statistical computing. We have developed a new Java-based infrastructure, SOCR Motion Charts, for discovery-based exploratory analysis of multivariate data. This interactive data visualization tool enables the visualization of high-dimensional longitudinal data. SOCR Motion Charts allows mapping of ordinal, nominal and quantitative variables onto time, 2D axes, size, colors, glyphs and appearance characteristics, which facilitates the interactive display of multidimensional data. We validated this new visualization paradigm using several publicly available multivariate datasets including Ice-Thickness, Housing Prices, Consumer Price Index, and California Ozone Data. SOCR Motion Charts is designed using object-oriented programming, implemented as a Java Web-applet and is available to the entire community on the web at www.socr.ucla.edu/SOCR_MotionCharts. It can be used as an instructional tool for rendering and interrogating high-dimensional data in the classroom, as well as a research tool for exploratory data analysis.
Collapse
Affiliation(s)
- Jameel Al-Aziz
- Statistics Online Computational Resource Department of Computer Science and Engineering University of California, Los Angeles Los Angeles, CA 90095
| | | | | |
Collapse
|