1
|
A survey on multi-objective hyperparameter optimization algorithms for machine learning. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10359-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
AbstractHyperparameter optimization (HPO) is a necessary step to ensure the best possible performance of Machine Learning (ML) algorithms. Several methods have been developed to perform HPO; most of these are focused on optimizing one performance measure (usually an error-based measure), and the literature on such single-objective HPO problems is vast. Recently, though, algorithms have appeared that focus on optimizing multiple conflicting objectives simultaneously. This article presents a systematic survey of the literature published between 2014 and 2020 on multi-objective HPO algorithms, distinguishing between metaheuristic-based algorithms, metamodel-based algorithms and approaches using a mixture of both. We also discuss the quality metrics used to compare multi-objective HPO procedures and present future research directions.
Collapse
|
2
|
Accurate discharge and water level forecasting using ensemble learning with genetic algorithm and singular spectrum analysis-based denoising. Sci Rep 2022; 12:19870. [PMID: 36400829 PMCID: PMC9674858 DOI: 10.1038/s41598-022-22057-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 10/10/2022] [Indexed: 11/20/2022] Open
Abstract
Forecasting discharge (Q) and water level (H) are essential factors in hydrological research and flood prediction. In recent years, deep learning has emerged as a viable technique for capturing the non-linear relationship of historical data to generate highly accurate prediction results. Despite the success in various domains, applying deep learning in Q and H prediction is hampered by three critical issues: a shortage of training data, the occurrence of noise in the collected data, and the difficulty in adjusting the model’s hyper-parameters. This work proposes a novel deep learning-based Q–H prediction model that overcomes all the shortcomings encountered by existing approaches. Specifically, to address data scarcity and increase prediction accuracy, we design an ensemble learning architecture that takes advantage of multiple deep learning techniques. Furthermore, we leverage the Singular-Spectrum Analysis (SSA) to remove noise and outliers from the original data. Besides, we exploit the Genetic Algorithm (GA) to propose a novel mechanism that can automatically determine the prediction model’s optimal hyper-parameters. We conducted extensive experiments on two datasets collected from Vietnam’s Red and Dakbla rivers. The results show that our proposed solution outperforms current techniques across a wide range of metrics, including NSE, MSE, MAE, and MAPE. Specifically, by exploiting the ensemble learning technique, we can improve the NSE by at least \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$2\%$$\end{document}2%. Moreover, with the aid of the SSA-based data preprocessing technique, the NSE is further enhanced by more than \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$5\%$$\end{document}5%. Finally, thanks to GA-based optimization, our proposed model increases the NSE by at least \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$6\%$$\end{document}6% and up to \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$40\%$$\end{document}40% in the best case.
Collapse
|
3
|
Kaveh M, Mesgari MS. Application of Meta-Heuristic Algorithms for Training Neural Networks and Deep Learning Architectures: A Comprehensive Review. Neural Process Lett 2022; 55:1-104. [PMID: 36339645 PMCID: PMC9628382 DOI: 10.1007/s11063-022-11055-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/11/2022] [Indexed: 12/02/2022]
Abstract
The learning process and hyper-parameter optimization of artificial neural networks (ANNs) and deep learning (DL) architectures is considered one of the most challenging machine learning problems. Several past studies have used gradient-based back propagation methods to train DL architectures. However, gradient-based methods have major drawbacks such as stucking at local minimums in multi-objective cost functions, expensive execution time due to calculating gradient information with thousands of iterations and needing the cost functions to be continuous. Since training the ANNs and DLs is an NP-hard optimization problem, their structure and parameters optimization using the meta-heuristic (MH) algorithms has been considerably raised. MH algorithms can accurately formulate the optimal estimation of DL components (such as hyper-parameter, weights, number of layers, number of neurons, learning rate, etc.). This paper provides a comprehensive review of the optimization of ANNs and DLs using MH algorithms. In this paper, we have reviewed the latest developments in the use of MH algorithms in the DL and ANN methods, presented their disadvantages and advantages, and pointed out some research directions to fill the gaps between MHs and DL methods. Moreover, it has been explained that the evolutionary hybrid architecture still has limited applicability in the literature. Also, this paper classifies the latest MH algorithms in the literature to demonstrate their effectiveness in DL and ANN training for various applications. Most researchers tend to extend novel hybrid algorithms by combining MHs to optimize the hyper-parameters of DLs and ANNs. The development of hybrid MHs helps improving algorithms performance and capable of solving complex optimization problems. In general, the optimal performance of the MHs should be able to achieve a suitable trade-off between exploration and exploitation features. Hence, this paper tries to summarize various MH algorithms in terms of the convergence trend, exploration, exploitation, and the ability to avoid local minima. The integration of MH with DLs is expected to accelerate the training process in the coming few years. However, relevant publications in this way are still rare.
Collapse
Affiliation(s)
- Mehrdad Kaveh
- Department of Geodesy and Geomatics, K. N. Toosi University of Technology, Tehran, 19967-15433 Iran
| | - Mohammad Saadi Mesgari
- Department of Geodesy and Geomatics, K. N. Toosi University of Technology, Tehran, 19967-15433 Iran
| |
Collapse
|
4
|
Chaturvedi P, Khan A, Tian M, Huerta EA, Zheng H. Inference-Optimized AI and High Performance Computing for Gravitational Wave Detection at Scale. Front Artif Intell 2022; 5:828672. [PMID: 35252850 PMCID: PMC8889077 DOI: 10.3389/frai.2022.828672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 01/12/2022] [Indexed: 11/13/2022] Open
Abstract
We introduce an ensemble of artificial intelligence models for gravitational wave detection that we trained in the Summit supercomputer using 32 nodes, equivalent to 192 NVIDIA V100 GPUs, within 2 h. Once fully trained, we optimized these models for accelerated inference using NVIDIA TensorRT. We deployed our inference-optimized AI ensemble in the ThetaGPU supercomputer at Argonne Leadership Computer Facility to conduct distributed inference. Using the entire ThetaGPU supercomputer, consisting of 20 nodes each of which has 8 NVIDIA A100 Tensor Core GPUs and 2 AMD Rome CPUs, our NVIDIA TensorRT-optimized AI ensemble processed an entire month of advanced LIGO data (including Hanford and Livingston data streams) within 50 s. Our inference-optimized AI ensemble retains the same sensitivity of traditional AI models, namely, it identifies all known binary black hole mergers previously identified in this advanced LIGO dataset and reports no misclassifications, while also providing a 3X inference speedup compared to traditional artificial intelligence models. We used time slides to quantify the performance of our AI ensemble to process up to 5 years worth of advanced LIGO data. In this synthetically enhanced dataset, our AI ensemble reports an average of one misclassification for every month of searched advanced LIGO data. We also present the receiver operating characteristic curve of our AI ensemble using this 5 year long advanced LIGO dataset. This approach provides the required tools to conduct accelerated, AI-driven gravitational wave detection at scale.
Collapse
Affiliation(s)
- Pranshu Chaturvedi
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, United States
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Asad Khan
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, United States
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Minyang Tian
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - E. A. Huerta
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, United States
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- Department of Computer Science, University of Chicago, Chicago, IL, United States
| | - Huihuo Zheng
- Leadership Computing Facility, Argonne National Laboratory, Lemont, IL, United States
| |
Collapse
|