Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Hodjat B, Shahrzad H, Miikkulainen R. Domain-Independent Lifelong Problem Solving Through Distributed ALife Actors. ARTIFICIAL LIFE 2024;30:259-276. [PMID: 38048055 DOI: 10.1162/artl_a_00418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]

Lussange J, Vrizzi S, Palminteri S, Gutkin B. Mesoscale effects of trader learning behaviors in financial markets: A multi-agent reinforcement learning study. PLoS One 2024;19:e0301141. [PMID: 38557590 PMCID: PMC10984546 DOI: 10.1371/journal.pone.0301141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 03/08/2024] [Indexed: 04/04/2024] Open

Tsantekidis A, Passalis N, Tefas A. Modeling limit order trading with a continuous action policy for deep reinforcement learning. Neural Netw 2023;165:506-515. [PMID: 37348431 DOI: 10.1016/j.neunet.2023.05.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 01/20/2023] [Accepted: 05/28/2023] [Indexed: 06/24/2023]

James N, Menzies M. Collective Dynamics, Diversification and Optimal Portfolio Construction for Cryptocurrencies. ENTROPY (BASEL, SWITZERLAND) 2023;25:931. [PMID: 37372275 DOI: 10.3390/e25060931] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 06/07/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2023]

Abstract

Since its conception, the cryptocurrency market has been frequently described as an immature market, characterized by significant swings in volatility and occasionally described as lacking rhyme or reason. There has been great speculation as to what role it plays in a diversified portfolio. For instance, is cryptocurrency exposure an inflationary hedge or a speculative investment that follows broad market sentiment with amplified beta? We have recently explored similar questions with a clear focus on the equity market. There, our research revealed several noteworthy dynamics such as an increase in the market's collective strength and uniformity during crises, greater diversification benefits across equity sectors (rather than within them), and the existence of a "best value" portfolio of equities. In essence, we can now contrast any potential signatures of maturity we identify in the cryptocurrency market and contrast these with the substantially larger, older and better-established equity market. This paper aims to investigate whether the cryptocurrency market has recently exhibited similar mathematical properties as the equity market. Instead of relying on traditional portfolio theory, which is grounded in the financial dynamics of equity securities, we adjust our experimental focus to capture the presumed behavioral purchasing patterns of retail cryptocurrency investors. Our focus is on collective dynamics and portfolio diversification in the cryptocurrency market, and examining whether previously established results in the equity market hold in the cryptocurrency market and to what extent. The results reveal nuanced signatures of maturity related to the equity market, including the fact that correlations collectively spike around exchange collapses, and identify an ideal portfolio size and spread across different groups of cryptocurrencies.

Collapse

Online portfolio management via deep reinforcement learning with high-frequency data. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Sun S, Wang R, An B. Reinforcement Learning for Quantitative Trading. ACM T INTEL SYST TEC 2023. [DOI: 10.1145/3582560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Bonetti M, Bisi L, Restelli M. Risk-Averse Optimization of Reward-based Coherent Risk Measures. ARTIF INTELL 2023. [DOI: 10.1016/j.artint.2022.103845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

He FF, Chen CT, Huang SH. A multi-agent virtual market model for generalization in reinforcement learning based trading strategies. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.109985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]

Kwak D, Choi S, Chang W. Self-attention based deep direct recurrent reinforcement learning with hybrid loss for trading signal generation. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Bisi L, Santambrogio D, Sandrelli F, Tirinzoni A, Ziebart BD, Restelli M. Risk-averse policy optimization via risk-neutral policy optimization. ARTIF INTELL 2022. [DOI: 10.1016/j.artint.2022.103765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Gunjan A, Bhattacharyya S. A brief review of portfolio optimization techniques. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10273-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Liu Z, Luo H, Chen P, Xia Q, Gan Z, Shan W. An efficient isomorphic CNN-based prediction and decision framework for financial time series. INTELL DATA ANAL 2022. [DOI: 10.3233/ida-216142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Park K, Jung HG, Eom TS, Lee SW. Uncertainty-Aware Portfolio Management With Risk-Sensitive Multiagent Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022;PP:362-375. [PMID: 35604996 DOI: 10.1109/tnnls.2022.3174642] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

A Novel Trading Strategy Framework Based on Reinforcement Deep Learning for Financial Market Predictions. MATHEMATICS 2021. [DOI: 10.3390/math9233094] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Qiu Y, Qiu Y, Yuan Y, Chen Z, Lee R. QF-TraderNet: Intraday Trading via Deep Reinforcement With Quantum Price Levels Based Profit-And-Loss Control. Front Artif Intell 2021;4:749878. [PMID: 34778753 PMCID: PMC8586520 DOI: 10.3389/frai.2021.749878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 09/21/2021] [Indexed: 11/13/2022] Open

Sentiment-influenced trading system based on multimodal deep reinforcement learning. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107788] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Tsantekidis A, Passalis N, Toufa AS, Saitas-Zarkias K, Chairistanidis S, Tefas A. Price Trailing for Financial Trading Using Deep Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021;32:2837-2846. [PMID: 32516114 DOI: 10.1109/tnnls.2020.2997523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Wu ME, Syu JH, Lin JCW, Ho JM. Portfolio management system in equity market neutral using reinforcement learning. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02262-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Tsantekidis A, Passalis N, Tefas A. Diversity-driven knowledge distillation for financial trading using Deep Reinforcement Learning. Neural Netw 2021;140:193-202. [PMID: 33774425 DOI: 10.1016/j.neunet.2021.02.026] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 12/08/2020] [Accepted: 02/22/2021] [Indexed: 11/18/2022]

CNN-based multivariate data analysis for bitcoin trend prediction. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.107065] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02218-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

AbdelKawy R, Abdelmoez WM, Shoukry A. A synchronous deep reinforcement learning model for automated multi-stock trading. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/s13748-020-00225-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Deep Reinforcement Learning Agent for S&P 500 Stock Selection. AXIOMS 2020. [DOI: 10.3390/axioms9040130] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Cryptocurrency Trading Using Machine Learning. JOURNAL OF RISK AND FINANCIAL MANAGEMENT 2020. [DOI: 10.3390/jrfm13080178] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Leem J, Kim HY. Action-specialized expert ensemble trading system with extended discrete action space using deep reinforcement learning. PLoS One 2020;15:e0236178. [PMID: 32716945 PMCID: PMC7384672 DOI: 10.1371/journal.pone.0236178] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Accepted: 06/30/2020] [Indexed: 12/05/2022] Open

Calabuig J, Falciani H, Sánchez-Pérez E. Dreaming machine learning: Lipschitz extensions for reinforcement learning on financial markets. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.02.052] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Reinforcement Learning in Financial Markets. DATA 2019. [DOI: 10.3390/data4030110] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open

Carapuço J, Neves R, Horta N. Reinforcement learning applied to Forex trading. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.09.017] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Gokcesu K, Kozat SS. An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018;29:5565-5580. [PMID: 29994080 DOI: 10.1109/tnnls.2018.2806006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Emerging Technologies and Opportunities for Innovation in Financial Data Analytics: A Perspective. BIG DATA ANALYTICS 2018. [DOI: 10.1007/978-3-030-04780-1_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022] Open

Wang H, Huang T, Liao X, Abu-Rub H, Chen G. Reinforcement Learning for Constrained Energy Trading Games With Incomplete Information. IEEE TRANSACTIONS ON CYBERNETICS 2017;47:3404-3416. [PMID: 28885145 DOI: 10.1109/tcyb.2016.2539300] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Deng Y, Bao F, Kong Y, Ren Z, Dai Q. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017;28:653-664. [PMID: 26890927 DOI: 10.1109/tnnls.2016.2522401] [Citation(s) in RCA: 107] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]

Fallahpour S, Hakimian H, Taheri K, Ramezanifar E. Pairs trading strategy optimization using the reinforcement learning method: a cointegration approach. Soft comput 2016. [DOI: 10.1007/s00500-016-2298-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Mousavi S, Esfahanipour A, Zarandi MHF. A novel approach to dynamic portfolio trading system using multitree genetic programming. Knowl Based Syst 2014. [DOI: 10.1016/j.knosys.2014.04.018] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Chen X, Gao Y, Wang R. Online selective kernel-based temporal difference learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013;24:1944-1956. [PMID: 24805214 DOI: 10.1109/tnnls.2013.2270561] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Abstract

In this paper, an online selective kernel-based temporal difference (OSKTD) learning algorithm is proposed to deal with large scale and/or continuous reinforcement learning problems. OSKTD includes two online procedures: online sparsification and parameter updating for the selective kernel-based value function. A new sparsification method (i.e., a kernel distance-based online sparsification method) is proposed based on selective ensemble learning, which is computationally less complex compared with other sparsification methods. With the proposed sparsification method, the sparsified dictionary of samples is constructed online by checking if a sample needs to be added to the sparsified dictionary. In addition, based on local validity, a selective kernel-based value function is proposed to select the best samples from the sample dictionary for the selective kernel-based value function approximator. The parameters of the selective kernel-based value function are iteratively updated by using the temporal difference (TD) learning algorithm combined with the gradient descent technique. The complexity of the online sparsification procedure in the OSKTD algorithm is O(n). In addition, two typical experiments (Maze and Mountain Car) are used to compare with both traditional and up-to-date O(n) algorithms (GTD, GTD2, and TDC using the kernel-based value function), and the results demonstrate the effectiveness of our proposed algorithm. In the Maze problem, OSKTD converges to an optimal policy and converges faster than both traditional and up-to-date algorithms. In the Mountain Car problem, OSKTD converges, requires less computation time compared with other sparsification methods, gets a better local optima than the traditional algorithms, and converges much faster than the up-to-date algorithms. In addition, OSKTD can reach a competitive ultimate optima compared with the up-to-date algorithms.

Collapse

Ou SL, Liu LYD, Ou YC. Using a genetic algorithm-based RAROC model for the performance and persistence of the funds. J Appl Stat 2013. [DOI: 10.1080/02664763.2013.856870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Tung WL, Quek C. Financial volatility trading using a self-organising neural-fuzzy semantic network and option straddle-based approach. EXPERT SYSTEMS WITH APPLICATIONS 2011;38:4668-4688. [PMID: 32288336 PMCID: PMC7126939 DOI: 10.1016/j.eswa.2010.07.116] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Abstract

Financial volatility refers to the intensity of the fluctuations in the expected return on an investment or the pricing of a financial asset due to market uncertainties. Hence, volatility modeling and forecasting is imperative to financial market investors, as such projections allow the investors to adjust their trading strategies in anticipation of the impending financial market movements. Following this, financial volatility trading is the capitalization of the uncertainties of the financial markets to realize investment profits in times of rising, falling and side-way market conditions. In this paper, an intelligent straddle trading system (framework) that consists of a volatility projection module (VPM) and a trade decision module (TDM) is proposed for financial volatility trading via the buying and selling of option straddles to help a human trader capitalizes on the underlying uncertainties of the Hong Kong stock market. Three different measures, namely: (1) the historical volatility (HV), (2) implied volatility (IV) and (3) model-based volatility (MV) of the Hang Seng Index (HSI) are employed to quantify the implicit volatility of the Hong Kong stock market. The TDM of the proposed straddle trading system combines the respective volatility measures with the well-established moving-averages convergence/divergence (MACD) principle to recommend trading actions to a human trader dealing in HSI straddles. However, the inherent limitation of the MACD trading rule is that it generates time-delayed trading signals due to the use of moving averages, which are essentially lagging trend indicators. This drawback is intuitively addressed in the proposed straddle trading system by applying the VPM to compute future projections of the volatility measures of the HSI prior to the activation of the TDM. The VPM is realized by a self-organising neural-fuzzy semantic network named the evolving fuzzy semantic memory (eFSM) model. As compared to existing statistical and computational intelligence based modeling techniques currently employed for financial volatility modeling and forecasting, eFSM possesses several desirable attributes such as: (1) an evolvable knowledge base to continuously address the non-stationary characteristics of the Hong Kong stock market; (2) highly formalized human-like information computations; and (3) a transparent structure that can be interpreted via a set of linguistic IF-THEN semantic fuzzy rules. These qualities provide added credence to the computed HSI volatility projections. The volatility modeling and forecasting performances of the eFSM, when benchmarked to several established modeling techniques, as well as the observed trading returns of the proposed straddle trading system, are encouraging.

Collapse

Dhar V. Prediction in financial markets. ACM T INTEL SYST TEC 2011. [DOI: 10.1145/1961189.1961191] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Vassiliades V, Cleanthous A, Christodoulou C. Multiagent reinforcement learning: spiking and nonspiking agents in the iterated Prisoner's Dilemma. ACTA ACUST UNITED AC 2011;22:639-53. [PMID: 21421435 DOI: 10.1109/tnn.2011.2111384] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Abstract

This paper investigates multiagent reinforcement learning (MARL) in a general-sum game where the payoffs' structure is such that the agents are required to exploit each other in a way that benefits all agents. The contradictory nature of these games makes their study in multiagent systems quite challenging. In particular, we investigate MARL with spiking and nonspiking agents in the Iterated Prisoner's Dilemma by exploring the conditions required to enhance its cooperative outcome. The spiking agents are neural networks with leaky integrate-and-fire neurons trained with two different learning algorithms: 1) reinforcement of stochastic synaptic transmission, or 2) reward-modulated spike-timing-dependent plasticity with eligibility trace. The nonspiking agents use a tabular representation and are trained with Q- and SARSA learning algorithms, with a novel reward transformation process also being applied to the Q-learning agents. According to the results, the cooperative outcome is enhanced by: 1) transformed internal reinforcement signals and a combination of a high learning rate and a low discount factor with an appropriate exploration schedule in the case of non-spiking agents, and 2) having longer eligibility trace time constant in the case of spiking agents. Moreover, it is shown that spiking and nonspiking agents have similar behavior and therefore they can equally well be used in a multiagent interaction setting. For training the spiking agents in the case where more than one output neuron competes for reinforcement, a novel and necessary modification that enhances competition is applied to the two learning algorithms utilized, in order to avoid a possible synaptic saturation. This is done by administering to the networks additional global reinforcement signals for every spike of the output neurons that were not "responsible" for the preceding decision.

Collapse

Joseph D, Gangadhar G, Srinivasa Chakravarthy V. ACE (Actor–Critic–Explorer) paradigm for reinforcement learning in basal ganglia: Highlighting the role of subthalamic and pallidal nuclei. Neurocomputing 2010. [DOI: 10.1016/j.neucom.2010.03.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Weissensteiner A. A Q-learning approach to derive optimal consumption and investment strategies. IEEE TRANSACTIONS ON NEURAL NETWORKS 2009;20:1234-43. [PMID: 19497814 DOI: 10.1109/tnn.2009.2020850] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Freitas FD, De Souza AF, de Almeida AR. Prediction-based portfolio optimization model using neural networks. Neurocomputing 2009. [DOI: 10.1016/j.neucom.2008.08.019] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

, a simple reinforcement learning scheme for two-player zero-sum Markov games. Neurocomputing 2009. [DOI: 10.1016/j.neucom.2008.12.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Lee JW, Park J, O J, Lee J, Hong E. A Multiagent Approach to $Q$-Learning for Daily Stock Trading. ACTA ACUST UNITED AC 2007. [DOI: 10.1109/tsmca.2007.904825] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Kernel price pattern trading. APPL INTELL 2007. [DOI: 10.1007/s10489-007-0054-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Xu X, Hu D, Lu X. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning. ACTA ACUST UNITED AC 2007;18:973-92. [PMID: 17668655 DOI: 10.1109/tnn.2007.899161] [Citation(s) in RCA: 167] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Abstract

In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating an initial controller to ensure online performance.

Collapse

Maximizing winning trades using a novel RSPOP fuzzy neural network intelligent stock trading system. APPL INTELL 2007. [DOI: 10.1007/s10489-007-0055-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Solving Deep Memory POMDPs with Recurrent Policy Gradients. LECTURE NOTES IN COMPUTER SCIENCE 2007. [DOI: 10.1007/978-3-540-74690-4_71] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Ang KK, Quek C. Stock trading using RSPOP: a novel rough set-based neuro-fuzzy approach. ACTA ACUST UNITED AC 2006;17:1301-15. [PMID: 17001989 DOI: 10.1109/tnn.2006.875996] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

O J, LEE J, LEE J, ZHANG B. Adaptive stock trading with dynamic asset allocation using reinforcement learning. Inf Sci (N Y) 2006. [DOI: 10.1016/j.ins.2005.10.009] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]