1
|
Restrepo D, Wu C, Vásquez-Venegas C, Nakayama LF, Celi LA, López DM. DF-DM: A foundational process model for multimodal data fusion in the artificial intelligence era. RESEARCH SQUARE 2024:rs.3.rs-4277992. [PMID: 38746100 PMCID: PMC11092829 DOI: 10.21203/rs.3.rs-4277992/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
In the big data era, integrating diverse data modalities poses significant challenges, particularly in complex fields like healthcare. This paper introduces a new process model for multimodal Data Fusion for Data Mining, integrating embeddings and the Cross-Industry Standard Process for Data Mining with the existing Data Fusion Information Group model. Our model aims to decrease computational costs, complexity, and bias while improving efficiency and reliability. We also propose "disentangled dense fusion," a novel embedding fusion method designed to optimize mutual information and facilitate dense inter-modality feature interaction, thereby minimizing redundant information. We demonstrate the model's efficacy through three use cases: predicting diabetic retinopathy using retinal images and patient metadata, domestic violence prediction employing satellite imagery, internet, and census data, and identifying clinical and demographic features from radiography images and clinical notes. The model achieved a Macro F1 score of 0.92 in diabetic retinopathy prediction, an R-squared of 0.854 and sMAPE of 24.868 in domestic violence prediction, and a macro AUC of 0.92 and 0.99 for disease prediction and sex classification, respectively, in radiological analysis. These results underscore the Data Fusion for Data Mining model's potential to significantly impact multimodal data processing, promoting its adoption in diverse, resource-constrained settings.
Collapse
Affiliation(s)
- David Restrepo
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Departamento de Telemática, Universidad del Cauca, Popayán, Cauca, Colombia
| | - Chenwei Wu
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, Michigan, United States of America
| | | | - Luis Filipe Nakayama
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Ophthalmology, São Paulo Federal University, São Paulo, São Paulo, Brazil
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Diego M López
- Departamento de Telemática, Universidad del Cauca, Popayán, Cauca, Colombia
| |
Collapse
|
2
|
Chadaga K, Prabhu S, Bhat V, Sampathila N, Umakanth S, Upadya P S. COVID-19 diagnosis using clinical markers and multiple explainable artificial intelligence approaches: A case study from Ecuador. SLAS Technol 2023; 28:393-410. [PMID: 37689365 DOI: 10.1016/j.slast.2023.09.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Revised: 08/16/2023] [Accepted: 09/06/2023] [Indexed: 09/11/2023]
Abstract
The COVID-19 pandemic erupted at the beginning of 2020 and proved fatal, causing many casualties worldwide. Immediate and precise screening of affected patients is critical for disease control. COVID-19 is often confused with various other respiratory disorders since the symptoms are similar. As of today, the reverse transcription-polymerase chain reaction (RT-PCR) test is utilized for diagnosing COVID-19. However, this approach is sometimes prone to producing erroneous and false negative results. Hence, finding a reliable diagnostic method that can validate the RT-PCR test results is crucial. Artificial intelligence (AI) and machine learning (ML) applications in COVID-19 diagnosis has proven to be beneficial. Hence, clinical markers have been utilized for COVID-19 diagnosis with the help of several classifiers in this study. Further, five different explainable artificial intelligence techniques have been utilized to interpret the predictions. Among all the algorithms, the k-nearest neighbor obtained the best performance with an accuracy, precision, recall and f1-score of 84%, 85%, 84% and 84%. According to this study, the combination of clinical markers such as eosinophils, lymphocytes, red blood cells and leukocytes was significant in differentiating COVID-19. The classifiers can be utilized synchronously with the standard RT-PCR procedure making diagnosis more reliable and efficient.
Collapse
Affiliation(s)
- Krishnaraj Chadaga
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Srikanth Prabhu
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| | - Vivekananda Bhat
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Niranjana Sampathila
- Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| | - Shashikiran Umakanth
- Department of Medicine, Dr. TMA Hospital, Manipal Academy of Higher Education, Manipal, India
| | - Sudhakara Upadya P
- Manipal School of Information Sciences, Manipal Academy of Higher Education, Manipal, India
| |
Collapse
|
3
|
Aghayev Z, Szafran AT, Tran A, Ganesh HS, Stossi F, Zhou L, Mancini MA, Pistikopoulos EN, Beykal B. Machine Learning Methods for Endocrine Disrupting Potential Identification Based on Single-Cell Data. Chem Eng Sci 2023; 281:119086. [PMID: 37637227 PMCID: PMC10448728 DOI: 10.1016/j.ces.2023.119086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2023]
Abstract
Humans are continuously exposed to a variety of toxicants and chemicals which is exacerbated during and after environmental catastrophes such as floods, earthquakes, and hurricanes. The hazardous chemical mixtures generated during these events threaten the health and safety of humans and other living organisms. This necessitates the development of rapid decision-making tools to facilitate mitigating the adverse effects of exposure on the key modulators of the endocrine system, such as the estrogen receptor alpha (ERα), for example. The mechanistic stages of the estrogenic transcriptional activity can be measured with high content/high throughput microscopy-based biosensor assays at the single-cell level, which generates millions of object-based minable data points. By combining computational modeling and experimental analysis, we built a highly accurate data-driven classification framework to assess the endocrine disrupting potential of environmental compounds. The effects of these compounds on the ERα pathway are predicted as being receptor agonists or antagonists using the principal component analysis (PCA) projections of high throughput, high content image analysis descriptors. The framework also combines rigorous preprocessing steps and nonlinear machine learning algorithms, such as the Support Vector Machines and Random Forest classifiers, to develop highly accurate mathematical representations of the separation between ERα agonists and antagonists. The results show that Support Vector Machines classify the unseen chemicals correctly with more than 96% accuracy using the proposed framework, where the preprocessing and the PCA steps play a key role in suppressing experimental noise and unraveling hidden patterns in the dataset.
Collapse
Affiliation(s)
- Zahir Aghayev
- Department of Chemical and Biomolecular Engineering, University of Connecticut, Storrs, CT
- Center for Clean Energy Engineering, University of Connecticut, Storrs, CT
| | - Adam T. Szafran
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX
| | - Anh Tran
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX
- Texas A&M Energy Institute, Texas A&M University, College Station, TX
| | - Hari S. Ganesh
- Discipline of Chemical Engineering, Indian Institute of Technology Gandhinagar, Palaj, Gandhinagar, Gujarat - 382055, India
| | - Fabio Stossi
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX
- GCC Center for Advanced Microscopy and Image Informatics, Houston, TX
| | - Lan Zhou
- Department of Statistics, Texas A&M University, College Station, TX
| | - Michael A. Mancini
- Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX
- GCC Center for Advanced Microscopy and Image Informatics, Houston, TX
| | - Efstratios N. Pistikopoulos
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX
- Texas A&M Energy Institute, Texas A&M University, College Station, TX
| | - Burcu Beykal
- Department of Chemical and Biomolecular Engineering, University of Connecticut, Storrs, CT
- Center for Clean Energy Engineering, University of Connecticut, Storrs, CT
| |
Collapse
|
4
|
Real-world-events data sifting through ultra-small labeled datasets and graph fusion. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2022.109865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
5
|
Dhillon SK, Ganggayah MD, Sinnadurai S, Lio P, Taib NA. Theory and Practice of Integrating Machine Learning and Conventional Statistics in Medical Data Analysis. Diagnostics (Basel) 2022; 12:2526. [PMID: 36292218 PMCID: PMC9601117 DOI: 10.3390/diagnostics12102526] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/26/2022] [Accepted: 10/04/2022] [Indexed: 11/16/2022] Open
Abstract
The practice of medical decision making is changing rapidly with the development of innovative computing technologies. The growing interest of data analysis with improvements in big data computer processing methods raises the question of whether machine learning can be integrated with conventional statistics in health research. To help address this knowledge gap, this paper presents a review on the conceptual integration between conventional statistics and machine learning, focusing on the health research. The similarities and differences between the two are compared using mathematical concepts and algorithms. The comparison between conventional statistics and machine learning methods indicates that conventional statistics are the fundamental basis of machine learning, where the black box algorithms are derived from basic mathematics, but are advanced in terms of automated analysis, handling big data and providing interactive visualizations. While the nature of both these methods are different, they are conceptually similar. Based on our review, we conclude that conventional statistics and machine learning are best to be integrated to develop automated data analysis tools. We also strongly believe that machine learning could be explored by health researchers to enhance conventional statistics in decision making for added reliable validation measures.
Collapse
Affiliation(s)
- Sarinder Kaur Dhillon
- Data Science & Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Mogana Darshini Ganggayah
- Department of Econometrics and Business Statistics, School of Business, Monash University Malaysia, Kuala Lumpur 47500, Malaysia
| | - Siamala Sinnadurai
- Department of Population Medicine and Lifestyle Disease Prevention, Medical University of Bialystok, 15-269 Bialystok, Poland
| | - Pietro Lio
- Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK
| | - Nur Aishah Taib
- Department of Surgery, Faculty of Medicine, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| |
Collapse
|
6
|
Lohani S, Lukens J, Glasser RT, Searles TA, Kirby B. Data-Centric Machine Learning in Quantum Information Science. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac9036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
We propose a series of data-centric heuristics for improving the performance of machine learning systems when applied to problems in quantum information science. In particular, we consider how systematic engineering of training sets can significantly enhance the accuracy of pre-trained neural networks used for quantum state reconstruction without altering the underlying architecture. We find that it is not always optimal to engineer training sets to exactly match the expected distribution of a target scenario, and instead, performance can be further improved by biasing the training set to be slightly more mixed than the target. This is due to the heterogeneity in the number of free variables required to describe states of different purity, and as a result, overall accuracy of the network improves when training sets of a fixed size focus on states with the least constrained free variables. For further clarity, we also include a ``toy model'' demonstration of how spurious correlations can inadvertently enter synthetic data sets used for training, how the performance of systems trained with these correlations can degrade dramatically, and how the inclusion of even relatively few counterexamples can effectively remedy such problems.
Collapse
|
7
|
Zhu LT, Chen XZ, Ouyang B, Yan WC, Lei H, Chen Z, Luo ZH. Review of Machine Learning for Hydrodynamics, Transport, and Reactions in Multiphase Flows and Reactors. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c01036] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Li-Tao Zhu
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Xi-Zhong Chen
- Department of Chemical and Biological Engineering, University of Sheffield, Sheffield, S1 3JD, U.K
| | - Bo Ouyang
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Wei-Cheng Yan
- School of Chemistry and Chemical Engineering, Jiangsu University, Zhenjiang, Jiangsu 212013, China
| | - He Lei
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Zhe Chen
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Zheng-Hong Luo
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| |
Collapse
|
8
|
Nogueira IBR, Santana VV, Ribeiro AM, Rodrigues AE. Using Scientific Machine Learning to Develop Universal Differential Equation for Multicomponent Adsorption Separation Systems. CAN J CHEM ENG 2022. [DOI: 10.1002/cjce.24495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Idelfonso B. R. Nogueira
- Laboratory of Separation and Reaction Engineering, Associate Laboratory LSRE/LCM Department of Chemical Engineering, Faculty of Engineering University of Porto, Rua Dr. Roberto Frias, 4200‐465, Porto Portugal
- ALiCE—Associate Laboratory in Chemical Engineering, Faculty of Engineering University of Porto, Rua Dr. Roberto Frias Porto Portugal
| | - Vinicius V. Santana
- Laboratory of Separation and Reaction Engineering, Associate Laboratory LSRE/LCM Department of Chemical Engineering, Faculty of Engineering University of Porto, Rua Dr. Roberto Frias, 4200‐465, Porto Portugal
- ALiCE—Associate Laboratory in Chemical Engineering, Faculty of Engineering University of Porto, Rua Dr. Roberto Frias Porto Portugal
| | - Ana M. Ribeiro
- Laboratory of Separation and Reaction Engineering, Associate Laboratory LSRE/LCM Department of Chemical Engineering, Faculty of Engineering University of Porto, Rua Dr. Roberto Frias, 4200‐465, Porto Portugal
- ALiCE—Associate Laboratory in Chemical Engineering, Faculty of Engineering University of Porto, Rua Dr. Roberto Frias Porto Portugal
| | - Alírio E. Rodrigues
- Laboratory of Separation and Reaction Engineering, Associate Laboratory LSRE/LCM Department of Chemical Engineering, Faculty of Engineering University of Porto, Rua Dr. Roberto Frias, 4200‐465, Porto Portugal
- ALiCE—Associate Laboratory in Chemical Engineering, Faculty of Engineering University of Porto, Rua Dr. Roberto Frias Porto Portugal
| |
Collapse
|
9
|
Nandakumar K, Tyagi M, Xu Y, Valsaraj KT, Joshi JB. Chemical Engineering at Crossroads. CAN J CHEM ENG 2022. [DOI: 10.1002/cjce.24506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- K. Nandakumar
- Cain Department of Chemical Engineering Louisiana State University Baton Rouge LA USA
| | - Mayank Tyagi
- Cain Department of Chemical Engineering Louisiana State University Baton Rouge LA USA
| | - Ye Xu
- Cain Department of Chemical Engineering Louisiana State University Baton Rouge LA USA
| | - K. T. Valsaraj
- Cain Department of Chemical Engineering Louisiana State University Baton Rouge LA USA
| | - J. B. Joshi
- J. B. Joshi Research Foundation, 401, Shubh Ashirwad Society, 5th Lane, Hindu Colony, Dadar (E) Mumbai India
| |
Collapse
|
10
|
Shi Y, Wang J, Wang Q, Jia Q, Yan F, Luo ZH, Zhou YN. Supervised Machine Learning Algorithms for Predicting Rate Constants of Ozone Reaction with Micropollutants. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.1c04697] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Yajuan Shi
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Jiang Wang
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Qiang Wang
- School of Chemical Engineering and Material Science, Tianjin University of Science and Technology, Tianjin, 300457, P. R. China
| | - Qingzhu Jia
- School of Marine and Environmental Science, Tianjin University of Science and Technology, Tianjin, 300457, P. R. China
| | - Fangyou Yan
- School of Chemical Engineering and Material Science, Tianjin University of Science and Technology, Tianjin, 300457, P. R. China
| | - Zheng-Hong Luo
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| | - Yin-Ning Zhou
- Department of Chemical Engineering, School of Chemistry and Chemical Engineering, State Key Laboratory of Metal Matrix Composites, Shanghai Jiao Tong University, Shanghai, 200240, P. R. China
| |
Collapse
|