Improving reinforcement learning algorithms: Towards optimal learning rate policies

in Mathematical Finance, Special Issue on Machine Learning in Finance

Reinforcement Learning

Machine Learning

Market Microstructure

Optimal Trading

A line-search like algorithm for RL and applications to execution

Authors

Charles-Albert Lehalle

Othmane Mounjid

Published

April 12, 2024

Abstract

This paper shows how to use results of statistical learning theory and stochastic algorithms to have a better understanding of the convergence of Reinforcement Learning (RL) once it is formulated as a fixed point problem. This can be used to propose improvement of RL learning rates. First, our analysis shows that the classical asymptotic convergence rate \(O(1/\sqrt{N})\) is pessimistic and can be replaced by \(O((\log(N)/N)^\beta)\) with \(1/2\leq\beta\leq 1\), and \(N\) the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate used in RL. We decompose our policy into two interacting levels: the inner and outer levels. In the inner level, we present the PASS algorithm (for “PAst Sign Search”) which, based on a predefined sequence of learning rates, constructs a new sequence for which the error decreases faster. The convergence of PASS is proved and error bounds are established. In the outer level, we propose an optimal methodology for the selection of the predefined sequence. Third, we show empirically that our selection methodology of the learning rate outperforms significantly standard algorithms used in RL for the three following applications: the estimation of a drift, the optimal placement of limit orders, and the optimal execution of a large number of shares.