01) Rainbow: Combining Improvements in Deep Reinforcement Learning Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver 2017 02) Proximal Policy Optimization Algorithms John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov 2017 03) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine 2018 04) Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis 2017 05) An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning Richard S. Sutton, A. Rupam Mahmood, Martha White 2015 06) Safe and Efficient Off-Policy Reinforcement Learning Rémi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare 2016 07) Convergence of stochastic iterative dynamic programming algorithms T Jaakkola, MI Jordan, SP Singh 1993 08) Neuro-dynamic Programming, Chap 7 Dimitri Bertsekas, John Tsitsiklis 1996 09) From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning Rémi Munos 2014 10) Deep reinforcement learning from human preferences Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei 2017